PGSUNet: A Phenology-Guided Deep Network for Tea Plantation Extraction from High-Resolution Remote Sensing Imagery

Xiaoyong Zhang; Bochen Jiang; Hongrui Sun

doi:10.3390/app152413062

,

and

¹

Beijing Key Laboratory of High Dynamic Navigation, Beijing Information Science and Technology University, Beijing 100192, China

²

Ningxia Hui Autonomous Region Natural Resources Survey and Investigation Institute, Yinchuan 750002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(24), 13062;https://doi.org/10.3390/app152413062

This article belongs to the Section Agricultural Science and Technology

Version Notes

Order Reprints

Abstract

Tea, recognized as one of the world’s three principal beverages, plays a significant role both economically and culturally. The accurate, large-scale mapping of tea plantations is crucial for quality control, industry regulation, and ecological assessments. Challenges arise in high-resolution imagery due to the spectral similarities with other land covers and the intricate nature of their boundaries. We introduce a Phenology-Guided SwinUnet (PGSUNet), a semantic segmentation network that amalgamates Swin Transformer encoding with a parallel phenology context branch. An intelligent fusion module within this network generates spatial attention informed by phenological priors, while a dual-head decoder enhances the precision through explicit edge supervision. Using Hangzhou City as the case study, PGSUNet was compared with seven mainstream models, including DeepLabV3+ and SegFormer. It achieved an F1-score of 0.84, outperforming the second-best model by 0.03, and obtained an mIoU of 84.53%, about 2% higher than the next-best result. This study demonstrates that the integration of phenological priors with edge supervision significantly improves the fine-scale extraction of agricultural land covers from complex remote sensing imagery.

Keywords:

tea plantation; remote sensing; phenology; deep learning

1. Introduction

China stands as the world’s foremost producer and consumer of tea, with its tea plantations spanning approximately 3.381 million hectares, which constitutes 62.16% of the global total. These plantations are predominantly situated in the hilly terrains extending from 18° to 37° N latitude and 98° to 123° E longitude [1].

In 2023, China’s production of dry tea reached 3.3395 million tons, with the total output value escalating to 329.668 billion yuan. This represented year-on-year increases of 4.98% and 3.65%, respectively [2]. Such growth is intricately linked to the expansion of tea plantations, often at the expense of forestland and farmland, which are converted to cultivate tea for short-term economic gains [3,4]. Although this boosts production, it adversely impacts biodiversity and disrupts the ecological equilibrium, thereby underscoring the ongoing conflict between agricultural expansion and environmental conservation [5,6]. Consequently, there is a compelling need for dynamic monitoring and the implementation of sustainable management practices using scientific methods.

Traditional monitoring techniques, predominantly based on field surveys, are not only costly and inefficient but also suffer from delays in data updating [7]. While UAV imagery enhances accuracy at local scales, its high operational costs and limited coverage render it impractical for extensive areas [8,9]. Remote sensing emerges as a superior alternative, offering extensive coverage, timely updates, and reduced costs [10]. Furthermore, integrating remote sensing with phenological observations facilitates the tracking of the tea growth cycle, thereby promoting precision agriculture [11].

Extensive research has explored the correlation between crop types and various indices including spectral, temporal, texture, and phenological [12,13]. Commonly, machine learning techniques such as Support Vector Machine (SVM) [10,14], Random Forest (RF) [2,15,16], and Decision Tree (DT) [17] are employed. These methods, which are straightforward and heavily dependent on expert knowledge, are particularly effective for feature selection and integration. For instance, Dihkan et al. combined spectral and texture features with SVM to achieve an overall accuracy of 97.4% [18]. Similarly, Xiong et al. utilized SVM for feature selection in tea plantations in Fujian Province, achieving a sampling point accuracy of 94.65% [19]. However, it is important to note that such overall accuracy metrics may not always be reliable, and the generalizability of sampling accuracy can be limited [20]. A significant limitation of machine learning in this context is its focus on pixel-level analysis, which overlooks spatial relationships and is susceptible to errors due to spectral confusion, consequently diminishing the accuracy of feature extraction [21].

Deep learning has created new opportunities for tea plantation mapping [22,23,24]. Tang first combined machine learning and deep learning [25]. He proposed an object-oriented CNN model and achieved 86% accuracy in Anxi County, Fujian. Wei compared SVM, KNN, and ResNet in Yunnan tea plantations [26]. The results demonstrated that deep learning significantly outperformed machine learning in terms of accuracy, F1-score, and recall. Numerous studies have also incorporated multi-temporal imagery. Yao developed an R-CNN model using multi-temporal Sentinel-2 data in Xinchang County [27]. The overall accuracy reached 95.3%. This outcome underscores the benefits of utilizing multi-temporal data. Multi-source data fusion has emerged as a new trend. Zhou et al. integrated multi-temporal Sentinel-2 optical data, Sentinel-1 SAR, and DEM in Anji County, Zhejiang [23]. Their model achieved an accuracy of 98.9%. However, tea plantations are often fragmented and irregular in shape, and low-resolution images frequently result in mixed pixels along edges, increasing error rates and diminishing reliability.

High-resolution imagery is instrumental in capturing the textural features of tea [25]. Phenological data can mitigate spectral confusion [28,29]. Based on Gaofen-2 (GF-2) images, this study aims to improve the fine-grained segmentation of tea plantations in Hangzhou by integrating phenological dynamics into a dedicated deep-learning framework. We develop a phenology-aware segmentation model that leverages multi-temporal seasonal patterns to enhance the discrimination between tea and other vegetation types, thereby increasing mapping accuracy under complex landscape conditions. The proposed framework is further used to produce detailed distribution, structural, and density maps of tea plantations across Hangzhou, providing an important reference for regional plantation management and ecological assessment.

2. Materials and Methods

2.1. Study Area

This investigation is centered on Hangzhou, located in Zhejiang Province, China (Figure 1). Positioned in southeastern China, within the southern wing of the Yangtze River Delta, Hangzhou’s geographical coordinates span from 29°11′ N to 30°33′ N and from 118°21′ E to 120°30′ E. Covering approximately 1.66 million hectares, it serves as Zhejiang’s hub for economic, cultural, and educational pursuits. Hangzhou is also renowned as a primary tea-producing region in China, often referred to as the “Tea Capital of China.” Here, the celebrated West Lake Longjing tea is cultivated.

Figure 1. Study area overview map. The location of Zhejiang Province in China, and the location of Hangzhou City within Zhejiang Province.

The topography of Hangzhou is varied, encompassing hills, mountains, and plains, with the terrain descending from southwest to northeast. The western, central, and southern regions are predominantly hilly, accounting for 65.6% of the total area, whereas plains are primarily situated in the northeast, comprising 26.4% of the land. Tea plantations are chiefly located in the hilly regions of the west and south. The climate is characterized by a subtropical monsoon, featuring four distinct seasons where rain and heat coincide. Average annual temperatures range between 16 °C and 17.5 °C, with annual rainfall between 1270 mm and 1450 mm, peaking during the plum rain season in June and the summer months. These geographical and climatic conditions collectively create favorable natural settings for tea cultivation.

2.2. Dataset

To obtain high-resolution spatial information, this study uses GF-2 imagery as the primary data source for detailed mapping. The high spatial resolution ensures accurate identification of tea plantations and precise delineation of their boundaries. This study employs both remote sensing data and field surveys. A total of 25 GF-2 scenes were downloaded for this study. The GF-2 satellite imagery was selected for its ability to capture detailed spatial features, providing 0.8 m panchromatic and 3.2 m multispectral resolution. To obtain comprehensive spatial details of the study area, the original images were processed through dehazing enhancement, radiometric calibration, and atmospheric correction, resulting in a 0.8 m resolution remote sensing dataset covering the entire Hangzhou region. These high-resolution datasets are instrumental in identifying plantations, delineating boundaries, and selecting validation samples. Additionally, Sentinel-2 multispectral data were utilized to discern phenological variations that differentiate tea from other types of vegetation. Cloud detection and masking were conducted using the QA60 band of Sentinel-2 imagery on the Google Earth Engine (GEE) platform. Scenes with more than 30% cloud cover were excluded, and remaining cloud-contaminated pixels were removed through bitwise masking based on the QA60 band. Subsequently, monthly cloud-free mosaics were generated using median compositing to ensure high-quality temporal consistency for phenological analysis. On the Google Earth Engine (GEE) platform, all available Sentinel-2 images from 2023 covering Hangzhou were preprocessed to remove clouds and create mosaics. Leveraging these datasets, we developed the THSI, which captures the seasonal dynamics of tea cultivation and enhances its separability from other land covers.

Hangzhou features diverse landscapes, including the renowned Longjing tea hills, plains adjacent to urban areas, and mixed environments with built-up land. To ensure robustness and generalization of the study, this diversity was considered during the sampling process. Annotations were made in ArcGIS10.8 through visual interpretation and digitization, using GF-2 imagery, historical plantation maps, and field surveys as references. Consequently, 45,486 image tiles of 512 × 512 pixels were generated.

In an effort to balance the classes and enhance training efficiency, all 12,982 positive tiles containing tea plantations were retained. These samples encompass both extensive, continuous plantations and fragmented small plots. To prevent overrepresentation of negative features, a random selection strategy was employed, resulting in 6518 representative negative tiles. Visual inspection showed that these negative samples included forests, other vegetation types, built-up areas, bare land, and water bodies. The final dataset comprised 19,500 tiles, divided into training, validation, and test sets in an 8:1:1 ratio, respectively consisting of 15,600, 1950, and 1950 tiles. This distribution was designed to ensure effective training and reliable evaluation.

2.3. Methods

2.3.1. Network Architecture

In this study, we introduced a segmentation network structured upon a multi-task learning framework, integrating a Swin Transformer encoder with a context fusion decoder. The proposed architecture comprises three concurrent pathways: the Swin Transformer encoder, a phenology context branch, and a multi-scale decoder equipped with a fusion module and dual output heads. This configuration aims to address two prevalent issues in tea plantation mapping: spectral confusion and blurred boundary delineation.

The encoder takes high-resolution multispectral imagery as input, allowing detailed spatial information to be preserved throughout the feature extraction process. Employing the Swin Transformer as its core, it retains dense image features that are further transmitted to the decoder through skip connections. The shifted-window attention mechanism captures two key characteristics of tea plantations. First, it extracts the fine-grained textures created by their row-based planting pattern, which produces distinctive stripe-like canopy structures. Second, it models the large-scale spatial dependencies that reflect the typical distribution of tea plantations, which are commonly located in mountainous regions and positioned farther from urban areas. The encoder generates four feature maps at varying scales, maintaining high-resolution details which are subsequently relayed to the decoder via skip connections.

Simultaneously, a lightweight CNN constitutes the phenology context branch, processing the phenology index map to extract semantic features across multiple scales. The primary objective of this branch is to distill temporal prior knowledge, rather than preserving spatial details, thereby preventing the direct amalgamation of low-resolution phenological data with high-resolution image features.

The decoder adopts a progressive up-sampling scheme and utilizes the Phenology-Guided Fusion Module (PGFM) rather than mere concatenation. This module amalgamates the decoder and phenology context features to produce a spatial attention map that reweights the decoder features. These weighted features are then combined with the high-resolution features from the skip connections. Ultimately, the network employs dual output heads, one for segmentation and another for edge prediction, establishing a robust multi-task framework. The edge prediction head is designed to compensate for the interference caused by the integration of low-resolution phenological features with high-resolution spatial data. By explicitly learning boundary-aware representations, it helps suppress noise from coarse phenology cues and refines object contours for more accurate tea plantation delineation. The architecture of the network is depicted in Figure 2.

Figure 2. Architecture of the PGSUNet network. The framework integrates a Swin Transformer encoder, a phenology context branch, and a decoder with dual output heads.

2.3.2. Fusion Module

The PGFM represents a pivotal innovation within this framework. It amalgamates three distinct types of features: the low-resolution semantic features from the decoder, the high-resolution spatial features from the encoder, and the prior knowledge derived from the phenology branch.

Contrary to traditional approaches that employ gated skip connections, the PGFM ensures the unmitigated transfer of encoder features. At each stage of the decoder, the fusion process is dynamically executed. Initially, decoder features are concatenated with phenology features. This combined feature set is then processed through a small convolutional attention network, which generates a spatial attention map. This map enables the element-wise multiplication that reweights the decoder features. Subsequently, these weighted features are concatenated with those from the high-resolution encoder. A convolutional block then further refines these combined features. This strategic integration leverages phenological knowledge while preserving the fine spatial details, thereby facilitating precise segmentation in complex landscapes. The internal structure of the PGFM is illustrated in Figure 3.

Figure 3. Structure of the PGFM. The module combines semantic features, high-resolution details, and phenological priors to produce refined decoder outputs.

2.3.3. Phenology Index Construction

Tea plantations are characterized as evergreen crops that are densely planted and intensively managed. Their distinctive remote sensing signatures differ markedly from those of seasonal farmland and natural forests. To accurately capture these unique features, we developed a set of spectral indices.

Initially, we selected the Enhanced Vegetation Index (EVI2) [30], which demonstrates superior performance over the NDVI in dense vegetation canopies by avoiding signal saturation. This index more precisely reflects the robust biomass characteristic of tea plantations. Subsequently, we incorporated the Red-edge Chlorophyll Index (CIre) [31], which utilizes the red-edge spectral band and is highly sensitive to chlorophyll content, thereby effectively monitoring the physiological activity of tea shoots. Additionally, we employed the Bare Soil Index (BSI) [32] to quantify the structural feature of exposed soil between rows in tea plantations, distinguishing them from forests where the soil is typically fully covered. The calculations for these indices are specified in Equations (1)–(3).

EVI 2 = 2.5 \times \frac{NIR - Red}{NIR + 2.4 \times Red + 1}

(1)

CIre = \frac{NIR}{RedEdge 1} - 1

(2)

BSI = \frac{(SWIR 1 + Red) - (NIR + Blue)}{(SWIR 1 + Red) + (NIR + Blue)}

(3)

To synthesize these indices and emphasize the unique spectral response of tea during the spring harvest, we proposed the Vegetation Phenology Separation Index (VPSI), calculated as delineated in Equation (4).

{VPSI}_{i} = {EVI 2}_{i} + {CIre}_{i} - {BSI}_{i}

(4)

The methodology for calculating VPSI is depicted in Figure 4.

Figure 4. Workflow of VPSI calculation.

To elucidate seasonal differences, we sampled five land-cover types across Hangzhou, including tea plantations, forests, croplands, water bodies, and built-up areas. We then plotted the annual VPSI curves for these land covers. As illustrated in Figure 5, tea plantations exhibit a distinct seasonal pattern.

Figure 5. Time-series VPSI curves of typical land-cover types.

The VPSI curve for tea plantations decreases from the beginning of the year, reaching a nadir in May, and subsequently rises over several months. This pattern coincides with intensive harvesting and pruning activities during April–May [33], a period during which young shoots are harvested, leading to a reduction in vegetation greenness and chlorophyll content, and an increase in bare soil exposure. These changes manifest as a pronounced “valley” in the VPSI curve.

To quantify this seasonal variation, we introduced the THSI, which measures the disparity between VPSI values during the growing and dormant seasons, as shown in Equation (5).

THSI = \sum_{t = 6}^{10} ({VPSI}_{t} - {VPSI}_{t - 1}) + \sum_{t = 1}^{5} ({VPSI}_{t - 1} - {VPSI}_{t})

(5)

Here, represents the VPSI value for month t. This index aggregates approximately 10 months of time-series data, effectively distinguishing tea plantations by their characteristic harvest-recovery cycle. It also differentiates these plantations from forests, which exhibit smoother phenological curves, and from seasonal farmland, which follows different temporal dynamics. The resultant THSI map serves as a critical input to the phenology attention module, providing essential spatial prior knowledge.

2.3.4. Accuracy Evaluation

To assess the performance of the model, we employed multiple metrics, including Overall Accuracy (OA), Precision, Recall, F1-score, and Mean Intersection over Union (mIoU). The definitions of these metrics are provided in Equations (6)–(10). OA quantifies the proportion of pixels accurately classified. Precision represents the ratio of true positives to predicted positives, while Recall denotes the ratio of true positives to all actual positives. The F1-score calculates the harmonic mean between Precision and Recall. mIoU evaluates the degree of overlap between the prediction and the ground truth.

O A = \frac{TP + TN}{TP + TN + FP + FN}

(6)

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

R e c a l l = \frac{T P}{T P + F N}

(8)

m I o U = \frac{1}{2} [\frac{T P}{T P + F P + F N} + \frac{T N}{T N + F N + F P}]

(9)

F 1_S c o r e = \frac{2 \times P A \times Re c a l l}{P A + Re c a l l}

(10)

Within these equations, TP (True Positive) denotes correctly predicted tea pixels, FP (False Positive) signifies non-tea pixels incorrectly classified as tea, FN (False Negative) refers to tea pixels mistakenly identified as non-tea, and TN (True Negative) indicates non-tea pixels correctly classified. These metrics collectively provide a thorough assessment of the segmentation performance.

2.4. Experimental Setup

The experiments were conducted using the PyTorch2.4.1 framework. We utilized an NVIDIA GeForce RTX 3090 GPU for both training and inference processes.

Regarding the loss function, two outputs were optimized independently. The primary segmentation head generated a binary mask distinguishing between tea and background. Given the class imbalance, we implemented Focal Loss, as delineated in Equation (11).

L_{seg} = - α_{t} {(1 - p_{t})}^{γ} \log (p_{t})

(11)

In this context, represents the probability predicted by the model for the correct class, and γ is the focusing parameter. Focal Loss is designed to decrease the influence of easily classified samples, thereby concentrating the model’s efforts on more challenging samples. This adjustment enhances the segmentation of minority classes, such as tea plantations.

For boundary prediction, an auxiliary edge head was employed to produce boundary maps. This component was trained using the Binary Cross-Entropy Loss with Logits (BCE With Logits Loss), as detailed in Equation (12).

L_{edge} = - [y \cdot \log (σ (x)) + (1 - y) \cdot \log (1 - σ (x))]

(12)

Here, denotes the ground-truth edge label, represents the model output logits, and is the sigmoid function. This mechanism provides explicit boundary supervision and mitigates spatial blurring caused by low-resolution phenology data and patch-based Transformer features.

The total loss function combined both segmentation and edge losses, which were weighted according to Equation (13).

L_{total} = L_{seg} + λ L_{edge}

(13)

To optimize the weight λ, which balances the segmentation and edge tasks, sensitivity experiments were performed. We tested multiple values, each replicated three times over 20 epochs, and recorded the average F1-score on the validation set. Optimal performance was achieved when λ = 6.0, as illustrated in Figure 6.

Figure 6. Sensitivity analysis of edge loss weight λ.

Through this multi-task joint optimization strategy, the model learns precise and continuous boundary representations while effectively segmenting regions. This approach significantly enhances both the quality and the accuracy of the final segmentation contours.

3. Results

3.1. Comparative Experiments

We compared PGSUNet with seven well-known semantic segmentation networks: DeepLabV3+ [34], HRNet [35], SegFormer [36], SegFormer-B2 [36], Swin-UNet [37], K-Net [38] and MaskFormer [39]. All models were uniformly trained under identical conditions using the same datasets, image resolutions, and hyperparameter settings to ensure fairness.

The quantitative results are presented in Table 1, ranked by F1-score. The best performance is highlighted in bold; the second-best is underlined. PGSUNet outperformed all other models across all metrics. Its F1-score was 4.35% higher than that of the second-best model, and its precision increased by more than 6%. This improvement underscores the effectiveness of the phenology-guided fusion and edge supervision techniques used.

Table 1. Performance comparison of segmentation networks in tea plantation extraction.

The qualitative results are depicted in Figure 7. We selected three representative regions, forests, wetlands, and croplands, that are often mistaken for tea plantations. By comparing the classification results with the ground truth, it can be observed that even the best-performing networks still exhibit limitations when dealing with challenging scenarios. The first case is tea plantations located in forest areas, where the plantation boundaries are irregular and often fragmented by roads. In this scene, DeepLabV3+ and MaskFormer show problems of boundary adhesion and edge blurring. The second case is farmland, where ridge-planted vegetable seedlings exhibit similar spectral and textural characteristics to tea plantations. In this situation, models such as SwinUNet and HRNet tend to produce misclassifications. The third case is wetland areas, which contain abundant grasses and aquatic vegetation with spectral features similar to tea plantations. Here, models like SwinUNet and DeepLabV3+ also show obvious false detections.

Figure 7. Visual comparison of segmentation results in complex scenarios.

In contrast, PGSUNet, through the use of the THSI index, captured the phenological rhythm and reduced spectral confusion. Moreover, it produced sharper boundaries due to edge supervision. These visual comparisons further validate the robustness of our proposed model.

3.2. Ablation Study

To validate the effectiveness of the phenology-guided fusion and edge supervision, we conducted ablation experiments. Four variants were tested: (1) Baseline (Swin-UNet with 4-band input), (2) 5-Band Concatenate (adding a phenology index as a fifth band directly), (3) Proposed model without edge supervision, (4) Full PGSUNet model.

The results, displayed in Table 2, show that the Baseline model performed poorly due to the absence of phenological information. The 5-Band Concatenate model showed improved results, demonstrating the value of phenology indices. However, the simple concatenation approach resulted in detail loss due to the mismatch between the low-resolution phenology data and high-resolution images. The proposed fusion strategy successfully mitigated this limitation and enhanced accuracy. The inclusion of edge supervision in the Full PGSUNet model led to significantly better performance, with higher F1-scores and mIoU than the version without edge supervision.

Table 2. Ablation study results for different model settings.

Both the comparative and ablation experiments confirm that our proposed method is highly effective in complex land-cover situations. By integrating temporal information from phenology with spatial details from high-resolution images, our method addresses the primary challenges faced in traditional remote sensing classification. It excels particularly when spectral features are similar and spatial textures are complex in agricultural scenes.

3.3. Large-Scale Extraction

We applied the trained PGSUNet to extract tea plantations across the entire Hangzhou area. The final extraction results are presented in Figure 8.

Figure 8. Extraction results of tea plantations in Hangzhou. Distribution map (top), area statistics (bottom left), and density map (bottom right).

The distribution map indicates that tea plantations are concentrated in the West Lake area and extend to Yuhang and Fuyang. Smaller plantations were also detected in Jiande and Chun’an. The density map emphasizes the aggregation of plantations in the core West Lake production region, while those in outer counties are more dispersed.

According to official statistics, the total area of tea plantations in Hangzhou is approximately 37,156 hectares. Chun’an accounts for the largest area (12,943 ha), followed by Fuyang (4786 ha), and then Jiande, Yuhang, and Tonglu. Our extraction results align with these statistics, confirming both the concentration in core production areas and the presence of scattered plantations in peripheral counties. This alignment demonstrates the accuracy and reliability of our method for large-scale extraction.

When examining the plantation density map (bottom right of Figure 8), it is evident that the core production areas, led by the West Lake District, exhibit highly concentrated planting. In contrast, the outer regions are characterized by scattered plantations with lower density. This spatial pattern reflects, to some extent, the historical formation and development of the traditional tea industry. The West Lake Longjing production area has historically relied on favorable natural conditions and a strong cultural foundation to develop an intensive core production zone. Surrounding counties mainly engage in dispersed planting and serve as supplementary production areas.

4. Discussion

4.1. Method Comparison and Evaluation

The proposed PGSUNet introduces a Phenology-Guided Fusion Module (PGFM) and an edge supervision mechanism in the decoding stage, achieving joint optimization between high-resolution spatial information and low-resolution temporal phenological features. As shown in Figure 7, traditional models such as DeepLabV3+ and MaskFormer often exhibit boundary fragmentation and spectral confusion in hilly terrains and field transition zones. In contrast, PGSUNet effectively restores the continuity of plot boundaries and maintains high semantic discrimination in texture-similar regions. Ablation experiments further verify the effectiveness of the model structure. The F1-Score of PGSUNet reaches 84.78%, improving by 0.31% compared with the model without edge supervision and by 5.08% compared with the baseline model without phenological context (Table 2). These results confirm that the PGFM module effectively reduces boundary uncertainty when low-resolution phenological data are introduced.

Compared with vegetation index–based methods that do not consider temporal dynamics [24], PGSUNet introduces a Temporal Hyperspectral Phenology Index (THSI) in the decoding stage, enabling multi-dimensional attention modulation of phenological information and collaborative optimization of spatial–temporal features. This design significantly improves recognition accuracy under complex surface conditions. Compared with algorithms relying solely on a single phenological index [28], PGSUNet combines high-resolution imagery to achieve sub-meter tea plantation extraction, effectively reducing boundary ambiguity and misclassification caused by low-resolution data. In contrast to CNN-based transfer learning methods using ultra-high-resolution imagery [25], the joint optimization of edge supervision and phenology-guided fusion in PGSUNet greatly decreases omission and commission errors in hilly–farmland transition zones, ensuring the integrity and continuity of field boundaries. Moreover, compared with deep learning models that lack phenological constraints [23], PGSUNet effectively captures the phenological “valley” feature of tea trees during the spring plucking season, enhancing the discrimination of spectrally similar land covers.

Overall, PGSUNet fully leverages the complementary advantages of high-resolution spatial information and phenological constraints. It improves boundary continuity and strengthens classification performance in texture-similar scenes, confirming the effectiveness of the proposed spatial–temporal collaborative optimization strategy.

4.2. Limitations and Outlook

The proposed PGSUNet integrates phenology constraints to mitigate spectral confusion between tea plantations and similar vegetation types such as wetlands and croplands. It achieves high-precision extraction results and offers new insights into remote sensing-based crop mapping and ecological monitoring.

However, elevation differences influence phenology by altering temperature and humidity conditions. These variations affect plant growth cycles and result in inconsistent phenology indices, thereby reducing extraction accuracy. Hangzhou features an elevation range of about 1800 m. Our experiments indicate that tea plantations at different elevations exhibit distinct phenological curves, as illustrated in Figure 9.

Figure 9. Extraction results of tea plantations at different elevations. The red wireframe indicate the Ground Truth in Image and Phenology. The red box shows the location of the image in DEM, and the arrow on the right indicating the corresponding elevation.

This issue underscores the need for further improvement. Incorporating DEM data to correct phenology indices under varying terrain conditions could decrease bias and enhance stability. Consequently, future work should integrate phenological features with topographic correction for improved adaptability in complex mountainous regions.

5. Conclusions

This study addresses the significant challenge of extracting tea plantations, which often exhibit spectral similarity and irregular boundaries, by employing GF-2 images of Hangzhou. We proposed and validated a phenology-guided segmentation approach, yielding several key findings: (1) Tea plantations exhibit a distinct phenological dip during the spring harvest and pruning periods. (2) The newly proposed THSI index effectively diminishes the confusion between tea plantations and other land-cover types. (3) The PGSUNet model, which integrates phenology-guided fusion with edge supervision, enhances both edge delineation and overall segmentation accuracy. Relative to the second-best performing model, the PGSUNet improved the F1-score by 4.35% and increased precision by 6.09%. (4) The large-scale extraction results reveal the spatial distribution of tea plantations in Hangzhou, which correlates well with official statistics, thus confirming the potential of high-resolution remote sensing and PGSUNet in crop mapping. (5) Phenological variations associated with elevation lead to index variations, suggesting that future research should include DEM-based corrections to improve the model’s applicability in complex terrains.

Author Contributions

Conceptualization, X.Z. and B.J.; methodology, B.J.; software, X.Z., B.J. and H.S.; validation, X.Z. and B.J.; formal analysis, B.J.; investigation, X.Z.; resources, X.Z.; data curation, B.J.; writing—original draft preparation, B.J.; writing—review and editing, X.Z. and B.J.; visualization, B.J.; supervision, X.Z.; project administration, H.S. and X.Z.; funding acquisition, H.S. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project Research and Application Demonstration of Key Technologies for Ecological Protection Red Line Monitoring and Supervision in the Yellow River Basin, Ningxia (Grant No. 2023ZRBSHZ044), and by the National Natural Science Foundation of China (Grant No. 41871348).

Data Availability Statement

The datasets presented in this article are not readily available because they are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Acknowledgments

The authors thank the editors and anonymous reviewers for their valuable. comments, which greatly improved the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PGSUNet	Phenology-Guided SwinUnet
PGFM	Phenology-Guided Fusion Module
EVI2	Enhanced Vegetation Index 2
CIre	Red-edge Chlorophyll Index
BSI	Bare Soil Index
VPSI	Vegetation Phenology Separation Index
THSI	Tea Harvest Stress Index
UAV	Unmanned Aerial Vehicle
mIoU	Mean Intersection over Union

References

Chen, Y.; Tian, S. Feature-Level Fusion between Gaofen-5 and Sentinel-1A Data for Tea Plantation Mapping. Forests 2020, 11, 1357. [Google Scholar] [CrossRef]
Kang, Y.; Chen, Z.; Li, L.; Zhang, Q. Construction of multidimensional features to identify tea plantations using multisource remote sensing data: A case study of Hangzhou city, China. Ecol. Inform. 2023, 77, 102185. [Google Scholar] [CrossRef]
Fang, H. Impact of land use changes on catchment soil erosion and sediment yield in the northeastern China: A panel data model application. Int. J. Sediment Res. 2020, 35, 540–549. [Google Scholar] [CrossRef]
Zhu, G.; Shangguan, Z.; Hu, X.; Deng, L. Effects of land use changes on soil organic carbon, nitrogen and their losses in a typical watershed of the Loess Plateau, China. Ecol. Indic. 2021, 133, 108443. [Google Scholar] [CrossRef]
Yin, X.-L.; Liu, X.-Y.; Jin, Q.; Li, X.-D.; Lin, S.-Y.; Yang, X.; Wang, W.-Q.; Zhang, Y.-X. Effects of different management methods on carbon, nitrogen, and phosphorus contents and their stoichiometric ratios in tea plants. Chin. J. Plant Ecol. 2021, 45, 749–759. [Google Scholar] [CrossRef]
He, S.; Zheng, Z.; Zhu, R. Long-term tea plantation effects on composition and stabilization of soil organic matter in Southwest China. Catena 2021, 199, 105132. [Google Scholar] [CrossRef]
Zhang, S.; Wang, C.; Li, J.; Zhang, Z. An object-oriented and variogram based method for automatic extraction of tea planting area from high resolution remote sensing imagery. Remote Sens. Inf. 2021, 36, 126–136. [Google Scholar] [CrossRef]
Huang, S.; Xu, W.; Wu, C.; Dai, Y.; Kou, W.L. Research Progress of Remote Sensing on Tea Plantation Monitoring. J. West China For. Sci. 2020, 49, 1–9. [Google Scholar] [CrossRef]
Zhang, Q.; Hu, M.; Zhou, Y.; Wan, B.; Jiang, L.; Zhang, Q.; Wang, D. Effects of UAV-LiDAR and Photogrammetric Point Density on Tea Plucking Area Identification. Remote Sens. 2022, 14, 1505. [Google Scholar] [CrossRef]
Chen, P.; Zhao, C.; Duan, D.; Wang, F. Extracting tea plantations in complex landscapes using Sentinel-2 imagery and machine learning algorithms. Community Ecol. 2022, 23, 163–172. [Google Scholar] [CrossRef]
Bai, J.; Sun, R.; Zhang, H.; Wang, Y.; Jin, Z. Tea plantation identification using GF-1 and Sentinel-2 time series data. Trans. CSAE 2021, 37, 179–185. [Google Scholar] [CrossRef]
Ma, C.; Yang, F.; Wang, X. Extracting tea plantations in southern hilly and mountainous region based on mesoscale spectrum and temporal phenological features. Remote Sens. Land Resour. 2019, 31, 141–148. [Google Scholar]
Peng, Y.; Qiu, B.; Tang, Z.; Xu, W.; Yang, P.; Wu, W.; Chen, X.; Zhu, X.; Zhu, P.; Zhang, X.; et al. Where is tea grown in the world: A robust mapping framework for agroforestry crop with knowledge graph and sentinels images. Remote Sens. Environ. 2024, 303, 114016. [Google Scholar] [CrossRef]
Huang, Y.; Li, S.; Yuang, L.; Cheng, J.; Li, W.; Chen, Y.; Huang, J. Estimating Tea Plantation Area Based on Multi-source Satellite Data. In Proceedings of the 2019 8th International Conference on Agro-Geoinformatics, Istanbul, Turkey, 16–19 July 2019; pp. 1–4. [Google Scholar] [CrossRef]
Akar, Ö.; Güngör, O. Integrating multiple texture methods and NDVI to the Random Forest classification algorithm to detect tea and hazelnut plantation areas in northeast Turkey. Int. J. Remote Sens. 2015, 36, 442–464. [Google Scholar] [CrossRef]
Wang, B.; He, B.; Lin, N.; Wang, W.; Li, T.Y. Tea plantation remote sensing extraction based on random forest feature selection. Eng. Technol. Ed. 2022, 52, 1719–1732. [Google Scholar] [CrossRef]
Xu, W.; Sun, R.; Jin, Z. Extracting tea plantations based on ZY-3 satellite data. Trans. CSAE 2016, 32, 161–168. [Google Scholar] [CrossRef]
Dihkan, M.; Guneroglu, N.; Karsli, F.; Guneroglu, A. Remote sensing of tea plantations using an SVM classifier and pattern-based accuracy assessment technique. Int. J. Remote Sens. 2013, 34, 8549–8565. [Google Scholar] [CrossRef]
Xiong, H.; Zhou, X.; Wang, X.; Cui, Y. Mapping the Spatial Distribution of Tea Plantations with 10m Resolution in Fujian Province Using Google Earth Engine. J. Geo Inf. Sci. 2021, 23, 1325–1337. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
Xiao, P.; Qian, J.; Yu, Q.; Lin, X.; Xu, J.; Liu, Y. Identify Tea Plantations Using Multidimensional Features Based on Multisource Remote Sensing Data: A Case Study of the Northwest Mountainous Area of Hubei Province. Remote Sens. 2025, 17, 908. [Google Scholar] [CrossRef]
Huang, X.; Zhu, Z.; Li, Y.; Wu, B.; Yang, M. Tea garden detection from high-resolution imagery using a scene-based framework. Photogramm. Eng. Remote Sens. 2018, 84, 723–731. [Google Scholar] [CrossRef]
Zhou, Y.; Yang, L.; Yuan, L.; Li, X.; Mao, Y.; Dong, J.; Lin, Z.; Zhou, X. High-Precision Tea Plantation Mapping with Multi-Source Remote Sensing and Deep Learning. Agronomy 2024, 14, 2986. [Google Scholar] [CrossRef]
Guo, X.; Liu, J.; Lu, Y. Mapping Tea Plantations from Medium-Resolution Remote Sensing Images Using Convolutional Neural Networks and Swin Transformer. In Proceedings of the 2023 11th International Conference on Agro-Geoinformatics, Wuhan, China, 25–28 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Tang, Z.; Li, M.; Wang, X. Mapping Tea Plantations from VHR Images Using OBIA and Convolutional Neural Networks. Remote Sens. 2020, 12, 2935. [Google Scholar] [CrossRef]
Wei, G.; Zhou, R. Comparison of machine learning and deep learning models for evaluating suitable areas for premium teas in Yunnan, China. PLoS ONE 2023, 18, e0282105. [Google Scholar] [CrossRef] [PubMed]
Yao, Z.; Zhu, X.; Zeng, Y.; Qiu, X. Extracting Tea Plantations from Multitemporal Sentinel-2 Images Based on Deep Learning Networks. Agriculture 2023, 13, 10. [Google Scholar] [CrossRef]
Xia, H.; Bian, X.; Pan, L.; Li, R. Mapping tea plantation area using phenology algorithm, time-series Sentinel-2 and Landsat images. Int. J. Remote Sens. 2023, 44, 2826–2846. [Google Scholar] [CrossRef]
Xu, W.; Huang, S.; Wu, C.; Xiong, Y.; Wang, L.; Lu, N.; Kou, W. The pruning phenological phase-based method for extracting tea plantations by field hyperspectral data and Landsat time series imagery. Geocarto Int. 2020, 37, 2116–2136. [Google Scholar] [CrossRef]
Jiang, Z.; Alfredo, R.H.; Kamel, D.; Tomoaki, M. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Anatoly, G.; Mark, N.M. Quantitative estimation of chlorophyll-a using reflectance spectra: Experiments with autumn chestnut and maple leaves. J. Photochem. Photobiol. B Biol. 1994, 22, 247–252. [Google Scholar] [CrossRef]
Nguyen, C.T.; Chidthaisong, A.; Kieu, D.P.; Huo, L.Z. A Modified Bare Soil Index to Identify Bare Land Features during Agricultural Fallow-Period in Southeast Asia Using Landsat 8. Land 2021, 10, 231. [Google Scholar] [CrossRef]
Lu, L.; Luo, W.; Zheng, Y.; Jin, J.; Liu, R.; Lv, Y.; Ye, Y.; Ye, J. Effect of different pruning operations on the plant growth, phytohormones and transcriptome profiles of the following spring tea shoots. Beverage Plant Res. 2022, 2, 12. [Google Scholar] [CrossRef]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021. [Google Scholar] [CrossRef]
Zhang, W.; Pang, J.; Chen, K.; Loy, C.C. K-Net: Towards Unified Image Segmentation. arXiv 2021. [Google Scholar] [CrossRef]
Cheng, B.; Schwing, A.G.; Kirillov, A. Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv 2021. [Google Scholar] [CrossRef]

Figure 1. Study area overview map. The location of Zhejiang Province in China, and the location of Hangzhou City within Zhejiang Province.

Figure 2. Architecture of the PGSUNet network. The framework integrates a Swin Transformer encoder, a phenology context branch, and a decoder with dual output heads.

Figure 3. Structure of the PGFM. The module combines semantic features, high-resolution details, and phenological priors to produce refined decoder outputs.

Figure 4. Workflow of VPSI calculation.

Figure 5. Time-series VPSI curves of typical land-cover types.

Figure 6. Sensitivity analysis of edge loss weight λ.

Figure 7. Visual comparison of segmentation results in complex scenarios.

Figure 8. Extraction results of tea plantations in Hangzhou. Distribution map (top), area statistics (bottom left), and density map (bottom right).

Figure 9. Extraction results of tea plantations at different elevations. The red wireframe indicate the Ground Truth in Image and Phenology. The red box shows the location of the image in DEM, and the arrow on the right indicating the corresponding elevation.

Table 1. Performance comparison of segmentation networks in tea plantation extraction.

Models	OA/%	Precision/%	Recall/%	mIoU/%	F1-Score/%
SegFormer	96.45	76.00	65.95	75.44	70.62
SegFormer-B2	96.32	71.59	71.48	75.91	71.53
KNet	97.05	78.16	75.50	79.63	76.81
MaskFormer	96.61	72.78	75.93	74.32	77.78
Deeplabv3+	97.30	78.94	79.55	81.39	79.24
SwinUNet	97.38	79.98	79.50	81.74	79.70
HRNet	97.41	78.75	82.20	82.27	80.43
PGSUNet	97.96	86.07	83.53	84.53	84.78

Table 2. Ablation study results for different model settings.

Models	OA/%	Precision (%)	Recall (%)	mIoU (%)	F1-Score
Baseline	97.38	79.98	79.50	81.74	79.70
5-Band Concat	97.41	84.18	81.03	82.59	82.58
Proposed w/o Edge Supervision	97.80	85.47	83.50	84.19	84.47
PGSUNet	97.96	86.07	83.53	84.53	84.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

PGSUNet: A Phenology-Guided Deep Network for Tea Plantation Extraction from High-Resolution Remote Sensing Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Dataset

2.3. Methods

2.3.1. Network Architecture

2.3.2. Fusion Module

2.3.3. Phenology Index Construction

2.3.4. Accuracy Evaluation

2.4. Experimental Setup

3. Results

3.1. Comparative Experiments

3.2. Ablation Study

3.3. Large-Scale Extraction

4. Discussion

4.1. Method Comparison and Evaluation

4.2. Limitations and Outlook

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics