Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine

Chi, Zelong; Chen, Hong; Chang, Sheng; Li, Zhao-Liang; Ma, Lingling; Hu, Tongle; Xu, Kaipeng; Zhao, Zhenjie

doi:10.3390/rs17060978

Open AccessArticle

Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine

by

Zelong Chi

^1,2

,

Hong Chen

³

,

Sheng Chang

^1,*,

Zhao-Liang Li

^4,5,

Lingling Ma

⁶,

Tongle Hu

⁷

,

Kaipeng Xu

⁸ and

Zhenjie Zhao

⁹

¹

Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences (AIRCAS), Beijing 100101, China

²

Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, Tibet University, Lhasa 850000, China

³

China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing 100083, China

⁴

State Key Laboratory of Efficient Utilization of Arid and Semi-Arid Arable Land in Northern China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

⁵

Key Laboratory of Agricultural Remote Sensing, Ministry of Agriculture and Rural Affairs, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

⁶

National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

⁷

College of Plant Protection, Hebei Agriculture University, Baoding 070001, China

⁸

Institute of Ecological Conservation and Restoration, Chinese Academy of Environmental Planning (CAEP), Beijing 100043, China

⁹

Sinochem Modern Agriculture Co., Ltd., Beijing 100069, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(6), 978; https://doi.org/10.3390/rs17060978

Submission received: 17 December 2024 / Revised: 26 January 2025 / Accepted: 27 January 2025 / Published: 11 March 2025

(This article belongs to the Special Issue Plant Disease Detection and Recognition Using Remotely Sensed Data)

Download

Browse Figures

Versions Notes

Abstract

Effective monitoring and management of potato late blight (PLB) is essential for sustainable agriculture. This study describes a methodology to improve PLB identification on a large scale. The method combines unsupervised and supervised machine learning algorithms. To improve the monitoring accuracy of the PLB regression model, the study used the K-Means algorithm in conjunction with morphological operations to identify potato growth areas. Input data consisted of monthly NDVI from Sentinel-2 and VH bands from Sentinel-1 (covering the year 2021). The identification results were validated on 221 field survey samples with an F1 score of 0.95. To monitor disease severity, we compared seven machine learning models: CART decision trees (CART), Gradient Tree Boosting (GTB), Random Forest (RF), single optical data Random Forest Time series model (TS–RF), single radar data Random Forest Time series model (STS–RF), multi-source data Gradient Tree Boosting Time series model (MSTS–GTB), and multi-source data Random Forest Time series model (MSTS–RF). The MSTS–RF model was the best performer, with a validation RMSE of 20.50 and an R² of 0.71. The input data for the MSTS–RF model consisted of spectral indices (NDVI, NDWI, NDBI, etc.), radar features (VH-band and VV-band), texture features, and Sentinel-2 bands synthesized as a monthly time series from May to September 2021. The feature importance analysis highlights key features for disease identification: the NIR band (B8) for Sentinel-2, DVI, SAVI, and the VH band for Sentinel-1. Notably, the blue band data (458–523 nm) were critical during the month of May. These features are related to vegetation health and soil moisture are critical for early detection. This study presents for the first time a large-scale map of PLB distribution in China with an accuracy of 10 m and an RMSE of 26.52. The map provides valuable decision support for agricultural disease management, demonstrating the effectiveness and practical potential of the proposed method for large-scale monitoring.

Keywords:

multi-source data fusion; time series data; potato late blight; Random Forest; K-means clustering

1. Introduction

Potatoes are recognized for their reduced greenhouse gas emissions and water requirements, conferring substantial environmental advantages over other key agricultural crops [1]. In light of these benefits, China has implemented initiatives to establish potatoes as a dietary staple, with the goal of reducing the disparity in production yields in comparison to leading producers. It is anticipated that these measures could lead to a reduction in the carbon-land-water impact of staple crops by 17–25% by 2030 [1,2]. Late blight, induced by the pathogen Phytophthora infestans, stands as a principal disease affecting potato cultivation, resulting in considerable losses in yield [3,4], and thus, impacting the role of potatoes in sustainable development endeavors. Consequently, extensive mapping of the prevalence of potato late blight (PLB) is essential for crisis management, facilitating effective surveillance, and achieving optimal disease management strategies. However, traditional disease monitoring is primarily dependent on field surveys, which are laborious and time-consuming, and are not aligned with the real-time monitoring requirements for extensive regions [5].

Given the limitations of traditional monitoring methods, advanced technologies are being increasingly explored. Previous studies have shown that, at the leaf organ level and under laboratory conditions, hyperspectral imaging technology can be used to achieve early detection of PLB, even in the invisible stage of leaf disease, and hyperspectral monitoring can be achieved non-destructively [6,7,8]. After remote sensing spectroscopy proved to be useful for the study of PLB, subsequent studies mostly combined this with UAV for monitoring. For example, some studies have used drones for large-scale surveillance of PLB. One approach to assessing PLB severity involves using RGB images captured by drones [9]. Another method evaluates the extent of late blight using high-resolution multispectral UAV images combined with machine learning (ML) algorithms [10]. A deep learning model known as CropdocNet has been proposed for accurately and automatically diagnosing late blight from UAV hyperspectral images [11]. Additionally, research by Sun et al. (2023) explores collaborative PLB monitoring using multiple drone sensors [12]. These studies demonstrate the feasibility of using remote sensing spectral information combined with ML or deep learning algorithms to detect PLB [12]. These methods include the use of multi-layer perceptrons, convolutional neural networks (CNNs), support vector regression (SVR), and Random Forest (RF) applied to multi-spectral UAV imagery. The results indicate that ML algorithms can achieve high accuracy, suggesting their potential to replace traditional visual estimation methods with an acceptable mean absolute error of 11.72% [12]. Despite the success of UAV technology and ML algorithms, their ability for large-scale disease monitoring remains limited in continuity and coverage. In contrast, satellite remote sensing offers an economical and efficient method for large-scale characterization of ground objects and long-term change detection [13], making it a valuable tool for disease monitoring. The existing studies of PLB monitoring by satellite remote sensing have confirmed that the relationship between spectrum and PLB has scale variability, but is feasible and has great potential for the use of Sentinel-2 multispectral monitoring of PLB [14].

Furthermore, crop distribution data serve as the foundational information for precise disease monitoring. Currently, there are few methods and models for mapping potato distribution; nevertheless, inspiration can be drawn from summarizing identification algorithms for other crops [15,16,17,18,19]. The Google Earth Engine (GEE) platform, in conjunction with Sentinel-1/2 data, effectively utilized phenology data and ML techniques to create continuous, inter-annual distribution maps of oilseed rape with a 10-m resolution [20]. Object-based methods and deep learning algorithms were employed to provide a novel perspective on the mapping of crop types in the northwest Ardabil region of Iran [21]. A multi-temporal Gaussian mixture model developed on the GEE platform was applied to efficiently map maize distributions [22]. Higher resolution of corn and soybean mapping in China [23] was achieved by PlanetScope data and a RF-based method. These successful cases mainly utilize data from optical satellites, which offer high spatial and temporal resolution, covering the visible, near-infrared (NIR), and shortwave-infrared (SWIR) bands. These bands can capture different spectral characteristics of crops, which are helpful for crop classification and monitoring [24,25].

While optical remote sensing images have garnered considerable attention, the full potential of Synthetic Aperture Radar (SAR) technology remains largely untapped. SAR sensors can obtain images under various weather conditions, making them ideal tools for long-term and multi-seasonal monitoring. In addition, SAR holds extensive potential for various applications in agricultural monitoring [26], for it can provide important information about crop growth and soil moisture conditions [25]. Variations in radar backscatter during different growth seasons can also aid in detecting crop types, as different crop growth stages produce distinct scattering mechanisms, including surface scattering, corner scattering, and volume scattering. Notably, the RADARSAT satellite derived C-band SAR data have exhibited superior capability in differentiating rice and potato crops within the Ganges Plain. This methodology has attained remarkable classification accuracy, reaching 94% for rice and 93% for potatoes during the initial phases of their developmental cycle [27]. By fusing multi-temporal Sentinel-1 SAR data and Sentinel-2 optical data, and applying the 3D U-Net deep learning network, scientists have significantly improved the accuracy of crop type mapping, providing an efficient crop monitoring and classification method for data-driven agriculture [28]. SAR and multispectral data have been successfully used to extract crop-specific land phenology from major crops in Europe. Validation against ground truth data confirmed the accuracy, demonstrating the complementary nature of these satellite data types in delivering crop phenology insights, which bolsters the precision and dependability of crop surveillance [29]. These successful cases of crop mapping using time series multi-source data and phenological information provide solid foundations for PLB distribution mapping [30,31].

Moreover, conventional techniques for processing remote sensing data for large-scale analyses are intricate, involving complexities across spatial and temporal dimensions [32]. The task of large-scale crop mapping entails the processing and handling of extensive volumes of diverse satellite imagery sourced from multiple sensors, which poses considerable challenges associated with ’big data’. As remote sensing technology advances, monitoring crop disease using large volumes of satellite imagery has become feasible, especially with the emergence of the GEE platform. GEE provides powerful capabilities for big data analysis and access to high-resolution imagery on a global scale. It has shown excellent performance in accessing remote sensing products and pre-processing through the cloud platform [33,34], enabling extensive surveillance and cartographic representation of PLB [35]. The unrestricted access to satellite-based global observations offers a supplementary method to consistently acquire large-scale PLB information at a low cost, both spatially and temporally, providing a new solution for potato production management [36,37]. Despite previous success in crop monitoring and distribution mapping based on GEE satellite remote sensing technology, there is limited research on the identification and mapping of large-scale crop disease. Most existing studies rely on UAV-scale hyperspectral and multispectral modeling for PLB monitoring, thereby restricting the utilization of reliable and freely available satellite data for large-scale and efficient disease monitoring.

At present, there is a growing trend in the utilization of remote sensing technology for agricultural monitoring, but challenges remain in large-scale detection and spatial visualization of PLB [14,18]. Moreover, there is no remote sensing-based potato distribution mapping in China, and only limited research on large-scale PLB monitoring methods. This study aims to develop a large-scale PLB monitoring method based on GEE to improve the accuracy and monitoring efficiency of disease identification, so as to provide real-time information support for agricultural disease management. The specific objectives are (1) precise identification of potato planting areas by using multi-source time series remote sensing data and K-Means clustering algorithm; (2) evaluation of the application effect of Sentinel-1 and Sentinel-2 data fusion in disease surveillance to reveal the complementarity of different data sources; and (3) provision of PLB monitoring framework based on machine learning algorithm, in order to provide a scientific basis for sustainable agricultural development and to reduce the yield loss caused by PLB.

2. Materials and Methods

2.1. Study Area

The research was undertaken in Guyuan County of Hebei Province and Duolun County of the Inner Mongolia Autonomous Region, both of which are significant potato-producing regions within China. The two regions are characterized by a diverse agricultural landscape, with main crops including potatoes, wheat, and corn. As depicted in Figure 1, the research area is situated within a temperate continental grassland climate region, with high temperatures in summer, abundant rainfall, and high humidity. In the study area, potato cultivation and sprouting occur between April and May, followed by a period of rapid growth from June to August, with the harvesting season extending from late August to September. The humid climatic conditions create an environment conducive to the development of PLB [9,32]. Throughout the history of potato cultivation, PLB has consistently been a major disease affecting yield and quality. Caused by the fungus Phytophthora infestans [38,39,40], PLB has historically caused severe yield losses in these areas, exerting a significant negative impact on the local agricultural economy.

2.2. Data Collection and PLB Severity Assessment

From 11–13 August 2021, a field survey of PLB was conducted in the study area to obtain first-hand disease occurrence data and lay the foundation for subsequent analysis and modeling.

In the study area, 221 representative samples were carefully selected to ensure that the samples could comprehensively cover different planting areas and disease occurrence. Using high-precision GPS equipment, the geographical position coordinates of each sample point are accurately recorded, which provides a reliable basis for the spatial analysis of subsequent data and matching with remote sensing images.

The observation area of each sample point is set to be consistent with the pixel size of the Sentinel-2 satellite remote sensing image, that is, 10 m ×10 m. This setting helps to achieve accurate correspondence between ground survey data and remote sensing data, improve the accuracy of data fusion analysis, and ensure that disease monitoring results based on remote sensing images better reflect the actual situation.

In the course of field investigation, the PLB visual estimation method was used to assess the degree of disease occurrence in each sample site in detail. According to the characteristics of the disease on the plant, the severity of PLB was divided into several grades, as shown in Table 1 [9,39,41]. By on-site observation and the experienced judgment of professionals, the PLB severity level of each sample point was determined, so as to obtain accurate ground disease occurrence data, and provide key reference information for subsequent model training and verification. Table 1 lists nine discrete disease severity levels (e.g., 0.0%, 0.1%, 1.0%, 5.0%, 25.0%, etc.) that were used in the field to assess and categorize disease severity during the data collection process. While the continuous output of the PLB prediction model can be mapped to these discrete levels for interpretation and validation, the actual output is not limited to these specific values. Instead, it provides a finer granularity of disease severity.

2.3. Multi-Resource Remote Sensing Data

In this research, Sentinel-1 and Sentinel-2 data were merged on GEE to generate a 2021 monthly time series for mapping potato distribution and monitoring PLB. The data spanned January to December 2021.

Selected Sentinel-2 Level-2A data (ID: “COPERNICUS/S2_SR_HARMONIZED”) underwent calibration and correction, totaling 436 scenes. All bands were utilized, with QA60 for cloud and shadow masking. Monthly cloud-free composites were created. Sentinel-2’s high resolution and red edge bands are vital for crop classification and disease monitoring. Sentinel-1 SAR data (ID: “COPERNICUS/S1_GRD”) were processed for backscatter features in VV and VH polarizations. The COPERNICUS-provided dataset included C-band images at 5.405 GHz. Interferometric wide mode images were selected, and spatial filtering ensured ROI relevance. Noise pixels below −30 dB were masked. Composites were created from 107 scenes within the study period.

2.4. Research Methods

This research develops a scalable PLB detection and mapping approach using GEE (Figure 2), enhancing disease surveillance precision. The study created a year-long dataset by integrating Sentinel-1 and Sentinel-2 data, extracting various indices to analyze spatial patterns and complexity by time series dataset. The detection of PLB involves multi-source remote sensing data, which have high-dimensional and complex feature distribution. K-Means clustering can effectively preprocess data and cluster similar samples together, thereby reducing the complexity of the data and identifying the spatial distribution of potatoes, so as to improve the efficiency of subsequent classification. Firstly, K-Means clustering was used to identify the main patterns and structures in the data, so as to provide more representative training samples for Random Forest. As a powerful ensemble learning method, Random Forest is able to process high-dimensional data and automatically select important features, thereby improving the accuracy and robustness of classification. A combination of K-means clustering and RF algorithms was applied for potato identification and PLB detection, with a decision tree model built by field data to boost model accuracy. The model was evaluated using F1 score, the coefficient of determination (R²) and the root mean square error (RMSE), yielding spatial distribution of PLB and detailed severity maps.

2.4.1. Construction of Features and Indices

In this study, optical remote sensing and radar remote sensing data are used. The indices calculated based on these original bands are as follows.

Selecting appropriate vegetation indices is essential for quantifying plant responses to disease-induced stress, which manifests in various physiological and biochemical changes [15,42]. In this study, a variety of vegetation indices, moisture indices, soil and building indices and other relevant indices were comprehensively used to capture and analyze potato growth status and late blight (PLB) occurrence. This includes Normalized Vegetation Index (NDVI) [43,44] and Ratio Vegetation index (RVI) [45], which are used to evaluate the health and growth of vegetation. Water indices, such as the Normalized Humidity Index (NDWI) and Disease Water Stress Index (DWSI) [44,46,47] to detect water conditions and identify disease-related water stress; soil and building indices, such as the Soil Adjusted Vegetation Index (SAVI) [48] and the Normalized Building Index (NDBI), which are used to consider soil impacts and differentiate urban areas; and other indices, such as Normalized Canopy Chlorophyll Index (NDCI) [49], were used to estimate chlorophyll content and further analyze the physiological characteristics of vegetation. The use of these indices aims to make full use of multi-band information for remote sensing data, accurately identify and monitor the occurrence and development of PLB and provide a scientific basis for disease management and agricultural decision-making. The indices utilized are enumerated in Table 2.

Furthermore, radar features, derived from Sentinel-1 SAR imagery, aid in assessing vegetation biomass, moisture content, and surface roughness, which are critical for monitoring crop health and detecting diseases. The study leverages Sentinel-1 SAR imagery’s distinct interaction with Earth’s surface to glean unique insights into surface and vegetation characteristics [28,53]. It specifically employs two polarization modes: Vertical Transmit and Vertical Receive (VV) and Vertical Transmit with Horizontal Receive (VH). The backscattering coefficients,

γ_{V V}

and

γ_{V H}

, are calculated from the SAR data to evaluate the radar response [35,54], defined as the ratio of the backscattered power to the incident power:

γ_{V V} = \frac{P_{b a c k s c a t t e r e d, V}}{P_{i n c i d e n t}}

(1)

γ_{V H} = \frac{P_{b a c k s c a t t e r e d, V H}}{P_{i n c i d e n t}}

(2)

Here, the polarization modes are denoted as

P_{v v}

and

P_{V H}

, respectively.

In addition, the texture features can quantify the spatial complexity by assessing the statistical relationships between pixel pairs in a given direction [55]. The Gray Level Co-occurrence Matrix (GLCM) method is an effective technique for texture analysis [56]. The GLCM was employed to extract texture indices, such as contrast, variance, mean, and difference variance, which enhance the analytical capabilities of images and provide essential supplementary information for land cover classification and vegetation assessment. Utilizing the robust computational power of the GEE platform with a 3 × 3 window size and a 0° orientation, these indices were calculated for obtaining a set of features that characterize the texture of the study area.

2.4.2. A Potato Identification Model Based on Multi-Source Time-Series Remote Sensing Features and Clustering Algorithms

To improve the PLB identification accuracy, the study utilized the K-Means clustering algorithm to identify potato growing areas because of its ability to efficiently process large-scale data by integrating multi-source time-series remote sensing features in an unsupervised clustering task. The K-Means clustering algorithm is a distance-based clustering method [57,58,59,60]. The algorithm functions through a recursive reduction of the Euclidean distance between each datum and its respective cluster centroid. Its widespread application is attributed to its simplicity, intuitive design, and efficacy. The objective is to segregate data points into clusters in a manner that optimizes for minimal intra-cluster variance, which translates to minimizing the aggregate Euclidean distances from each data point to the centroid of its designated cluster. The objective function can be expressed as:

J = \sum_{k = 1}^{K} \sum_{x \in S_{k}} {‖x - μ_{k}‖}^{2}

(3)

where

J

is the objective function,

K

is the number of clusters, which is the set of points in the

k

-th cluster, and

x

is a data point within the cluster, which is the center of the

k

-th cluster, representing the Euclidean distance.

In order to efficiently utilize this clustering method, we created a multi-temporal dataset spanning from January to December 2021 using Sentinel-2 optical and Sentinel-1 radar images. The GEE platform facilitated pre-processing, including noise reduction and smoothing. NDVI and VH band data were extracted to quantify vegetation health and crop structure. These indices were merged into a multi-band image for K-Means clustering, which analyzed 10,000 randomly selected pixels to identify potato-growing areas. The results were validated with 221 field samples labeled as potato or non-potato areas. This integrated approach provides a robust framework for precision agriculture.

In this study, K-means unsupervised classification was first employed to identify potato planting locations, thereby enhancing the accuracy of subsequent machine learning regression. To avoid errors from the initial unsupervised method affecting the accuracy of the subsequent machine learning regression, we implemented several measures to ensure the reliability of our results.

We used the K-means clustering method to identify potato-growing areas by integrating multi-source time-series remote sensing features with field sample data. Morphological operations were then performed on the clustering results to optimize isolated, broken pixels (which are unlikely to represent entire fields). Using the GEE platform, we processed and analyzed field grid images through a series of image processing and morphological analysis methods to identify and optimize crop distribution. Specifically, we defined a 5 × 5 structural element (K) and applied morphological opening operations, including erosion and dilation, to each pixel to remove noise from the image. Gaussian filter is applied to smooth the image after open operation to further reduce the noise. Then, a morphological closure operation is performed to connect adjacent areas by first expanding and then corroding. After that, the number of pixels in each connected region of the image after the closed operation is calculated to extract the features. Finally, the area threshold T is set to keep only the area greater than or equal to T, thereby streamlining and optimizing the classification of farmland areas. This method effectively improves the accuracy and efficiency of farmland distribution identification. In addition, we carried out a detailed error analysis on the clustering results of the K-Means method and selected four real scenes from Google Earth images to illustrate the spatial details of classification, so as to identify possible areas of classification error. Finally, a result-oriented test is carried out. In the final selected model, the correctness of the model is evaluated on the independent verification set, assuming that there are classification errors that may be generated by the K-Means method.

2.4.3. Euclidean Distance-Based Determination of Separability of PLB Characteristics

Previous studies have shown that the spectral curve of Sentinell-2 can still distinguish and identify PLB to varying degrees at 10 m spatial resolution [14]. This investigation employed Sentinel-2 imagery, initially processed via the GEE platform, to evaluate spectral variations across various Pineapple Leaf Blight (PLB) severity classes, thereby enhancing the comprehension of machine learning (ML) algorithms in PLB surveillance. The initial phase entailed the acquisition of data spanning from 6–15 August 2021, which was subsequently subjected to precise temporal and spatial filtering to ascertain the pertinence and precision of the dataset. Thereafter, high-resolution imagery was consolidated through image fusion techniques to ensure the maintenance of spatial and temporal coherence within the area of interest. To obtain surface feature samples, 221 sample regions from the synthesized images were extracted based on predefined feature coverage labels that included information on the severity of PLB. The samples were subsequently converted into feature vectors and mean and covariance matrices were computed for each PLB class. Additionally, the Euclidean Distance between the sample mean vectors was calculated to quantify the spectral differences between different PLB classes. This distance measurement offers an evaluation criterion to assess the separability between different classes. A smaller Euclidean Distance suggests similarity in spectral features, whereas larger distances denote better separability.

2.4.4. A PLB Identification and Spatial Visualization Model Based on ML and Time-Series Remote Sensing Features

In recent years, machine learning (ML) methods have been extensively applied to address the issue of PLB [11,14,17,61,62]. At the same time, unlike complex deep learning models that require a large number of training samples and parameters, GEE has some convenient built-in ML algorithms that allow users to access and process petabytes of remote sensing data. Using these advantages to extract phenological information from time series remote sensing data, favorable insights into the development of different diseases can be obtained from phenological data extracted from each crop growth cycle [34,63,64].

The course of PLB has obvious seasonal and temporal dynamics [65]. By using time series data, changes in the disease at different stages of growth can be captured [41,66,67], allowing for more accurate prediction and monitoring of PLB severity. Sentinel-1 SAR data and Sentinel-2 optical data each carry unique information. SAR data can obtain surface information under cloudy and rainy conditions, while optical data provides rich spectral information [68]. By fusing these two types of data, multi-dimensional characteristics of potato growth and disease development can be more fully captured. In particular, PLB changes manifest differently in the early and late stages of the disease, and these changes can be better identified through time series analysis of multi-source data [43,69,70]. Random Forest (RF) [71] is a powerful ensemble learning method capable of processing high-dimensional data and automatically selecting important features. It performs well in the processing of time series data and can effectively reduce overfitting and improve the generalization ability of the model. Therefore, we propose MSTS–RF (Time Series Random Forest with Multi-Source Data) method. In order to verify the reliability and superiority of the model, we will conduct several controlled experiments, specifically. The first is to verify the superiority of different ML models in PLB problems, and find the most effective model to solve PLB monitoring; the second is to compare the performance of the model without using time series data and the model with time series data; the third is to compare the performance differences between the two single time series data models of radar and optical data and the multi-source time series models, respectively. Finally, the optimal model is selected as the monitoring model for this study. The specific model introduction and modeling process are as follows:

Classification and Regression Trees (CARTs): CARTs employ decision trees for both classification and regression tasks [72]. In this study, we use the ‘ee.Classifier.smileCart()’ method on the GEE platform to create a CART classifier. In order to optimize the hyperparameters, we search in the range of ‘maxNodes’ (maximum number of leaf nodes) and ‘minLeafPopulation’ (minimum number of samples per leaf node) through grid search and finally determine the number of ‘maxNodes’ to be 10. ‘minLeafPopulation’ is 1 as the best parameter. Training data sets account for 70% and validation data sets account for 30%. The input features include various vegetation indices (such as NDVI, NDWI, NDBI, SAVI, IBI, RVI, DVI, NDCI, DWSI2, DWSI3, DWSI4) and texture features (such as contrast, variance, and standard deviation) calculated based on Sentinel-2 images synthesized one week before and after the date of 2021-08-12, as well as the VV and VH bands from Sentinel-1. The label attribute ‘PLB_percen’ indicates the severity of PLB. During model training, we use root mean square error (RMSE) and coefficient of determination (R²) to evaluate model performance.

Gradient Tree Boosting (GTB): This ensemble technique progressively incorporates trees to rectify the residuals of the preceding model, directed by the gradient of the loss function [73]. In this study, we use the GEE platforms ‘ee.Classifier.smileGradientTreeBoost()’ method to create a GTB regression model. The training process is similar to CART. Training and validation are separated, with 70% of the data used for model training and 30% for validation.

Random Forest (RF): An ensemble technique that constructs multiple decision trees and amalgamates their predictive outcomes [71], introducing randomness to avoid overfitting and enhance accuracy. In this study, we use the GEE platform’s ‘ee.Classifier.smileRandomForest()’ method to create a GTB regression model. The training process is similar to CART. Training and validation are separated, with 70% of the data used for model training and 30% for validation.

Time Series Random Forest Regression with Single Optical time series data (TS–RF): A specialized RF model that uses time-series data from a single source, Sentinel Optics, to analyze and predict trends based on temporal patterns. Using Sentinel-2 satellite data, vegetation and surface indices, such as NDVI, NDWI, NDBI, SAVI, IBI, RVI, DVI, NDCI and DWSI, were calculated, and rich surface features were extracted. In addition, the spatial information is enhanced by GLCM texture analysis. A multi-band time series dataset was established by synthesizing images month by month from May to September 2021. The image features were then trained using an RF model for monitoring PLBS. In the training phase, the data set is randomly divided into a training set accounting for 70% of the data and a validation set accounting for 30% of the data to evaluate the performance of the model. In order to optimize the hyperparameters, we search in the range of ‘numberOfTrees’ (the number of decision trees to create) through grid search, and finally determine ‘numberOfTrees’ 10 as the best parameter. The label attribute ‘PLB_percen’ indicates the severity of PLB. During model training, we use root mean square error (RMSE) and coefficient of determination (R²) to evaluate model performance.

Time Series Random Forest Regression with radar time series data (STS–RF): A specialized RF model that uses time series data from a single source from radar to analyze and predict PLB trends based on time patterns. It is similar to the TS–RF model in construction, but only the VV and VH bands synthesized every month are used for training to explore the disease monitoring capability of the radar.

Time Series Random Forest Regression with Multi-source Data (MSTS–RF): This study presents a Time Series Random Forest Regression model with Multi-source Data (MSTS–RF) to monitor PLB severity using remote sensing data. The dataset consists of 200 bands synthesized from Sentinel-2 and Sentinel-1 satellite data, covering all Sentinel-2 bands, remote sensing indices (e.g., NDVI, NDWI, NDBI, etc., see Table 2), texture features from GLCM, and radar polarization bands (VV and VH). Monthly image composites were generated for May-September on a month-by-month basis to create a multi-band time series dataset. The training data, labeled with PLB severity classes (0–100%) based on field surveys (221 samples), were partitioned into a 70% training set and a 30% validation set. The Random Forest model was configured with the following hyperparameters: 1000 trees, default variables per split (square root of total variables), minimum leaf population of 1, and a bag fraction of 0.5. The maximum number of nodes per tree was not limited, and the random seed was set to 0 to ensure repeatability. Model performance was evaluated using R² and RMSE. The MSTS–RF outputs continuous values representing PLB severity and spatial distribution, visualized as image layers with legends for reference.

Time Series Gradient Tree Boosting Regression with Multi-source Data (MSTS–GTB): In order to verify the performance of multi-source time series data in different machine learning models, it is proposed to refer to the MSTS–RF model training method, which is consistent except for the different machine learning methods based on it. Using ‘ee. Classifier.smileGradientTreeBoost (10)’ training, we utilized R² to assess model fit and RMSE to measure the deviation between model predictions and actual data.

Finally, the optimal model is selected comprehensively through the verification performance of these seven models.

2.4.5. Performance Metrics

To evaluate the effectiveness of the proposed models and methods for detecting and monitoring PLB, several performance metrics were employed. These metrics provide quantitative measures of model accuracy, precision, and overall predictive power, allowing for a comprehensive assessment of the model’s performance. The key performance metrics used in this study include

A c c u r a c y

,

P r e c i s i o n

,

R e c a l l

,

F 1

, Root Mean Square Error (RMSE), and Coefficient of Determination (R²). Their calculations are as follows.

A c c u r a c y

: refers to the ratio of the number of samples correctly predicted by the model to the total number of samples.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(4)

P r e c i s i o n

: refers to the proportion of plants that the model predicts to be affected by late blight that are actually affected.

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

R e c a l l

: refers to the proportion of plants that the model correctly detects for late blight. It tells us how successfully the model can predict how many plants will actually be infested.

R e c a l l = \frac{T P}{T P + F N}

(6)

F 1

: The F1 score is a blended average of precision and recall, which combines the accuracy and recall of the model.

F 1 = \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

Here,

T P

(True Positive) is the number of true examples,

T N

(True Negative) is the number of true negative examples,

F P

(False Positive) is the number of false positive examples, and

F N

(False Negative) is the number of false negative examples.

The R-square (coefficient of determination), Root Mean Square Error (RMSE) and Mean Square Error (MSE) were chosen to evaluate the goodness-of-fit of the regression model as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(8)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(9)

where

y_{i}

is the observed value,

{\hat{y}}_{i}

is the model prediction, and n is the sample size.

To intuitively evaluate the performance of the selected optimal model under different training set sizes and prevent overfitting, we generated learning curves by randomly selecting samples of varying proportions from the original data to form training and validation sets. We tested several ratios, including 90% training and 10% validation, 70% training and 30% validation, 50% training and 50% validation, 30% training and 70% validation, and 10% training and 90% validation. For each ratio, the raw data were divided into training and validation sets, and the selected optimal model was trained on each training set size. We then calculated the performance metrics—root mean square error (RMSE) and coefficient of determination (R²)—for both the training and validation sets. RMSE measures the difference between the model’s predicted values and the actual values, while R² indicates how well the model fits the data. Using Python 3, we plotted learning curves with the number of training samples on the horizontal axis and the performance metrics (RMSE and R²) on the vertical axis, drawing separate curves for the training and validation sets.

3. Results

3.1. Spatial Clustering Analysis Based on K-Means Algorithm and Sentinel 1/2 Time-Series Data

The monthly time series data from Sentinel-1 and Sentinel-2 satellites were processed within the study area throughout 2021. The K-Means clustering algorithm was utilized to perform spatial clustering analysis on the time series data, categorizing them into 20 unique clusters within the study area, as depicted in Figure 3. The final clustering result curves illustrate how the cluster centers evolve during the iterative process and how data points progressively stabilize within specific clusters (Figure 4). The clustering results were interpreted by examining the various vegetation conditions and growth states, as detailed in Table A1 and Table A2. Table A3 integrates the results of the NDVI change analysis (Table A1) with those from the VH band analysis (Table A2), which are combined as different land cover types.

Comprehensively considering the land cover classification map and the time series clustering graph of VH and NDVI, we can carry out a more in-depth analysis and understanding.

1. Correlation between land cover type and VH/NDVI value:

Urban areas may show stable low values in the VH chart (around −25.5 dB) because artificial surfaces reflect less radar waves. At the same time, in the NDVI map, these areas may show low values because there is less vegetation cover. Agricultural regions may show seasonal fluctuations in the VH map, reflecting the effects of irrigation and crop growth. In the NDVI map, these areas may show high values during the growing season. Forest areas are likely to show high values in both maps, with higher VH values, because the forest has a strong reflection of radar waves and higher NDVI values because the forest has rich vegetation cover.

2. Seasonal changes:

The time series maps of both VH and NDVI show significant seasonal variations, which echo the color distribution in the land cover classification maps. For example, an increase in VH values in the summer may correspond to the expansion of green areas in the land cover map, which may represent areas that are heavily vegetated in the summer. The peak of NDVI occurs in spring and summer, which is consistent with the vegetation growth cycle shown in the land cover map.

3. Impact of human activities:

Agricultural irrigation and crop growth are clearly reflected in both VH and NDVI maps. Seasonal fluctuations in VH values may match the distribution of agricultural land in the land cover map. The influence of urbanization on VH and NDVI is also obvious. The urban area shows a stable low value in the VH map and a low value in the NDVI map, which is consistent with the distribution of the urban area in the land cover map. These charts and data offer insights into land cover and vegetation changes, aiding potato field identification.

3.2. Potato Identification Model Based on Spatial Clustering and Multi-Indices

Spatial clustering results identified using multi-source time-series remote sensing features and the K-Means clustering algorithm were used in combination with measured potato distribution samples (based on the Section 2.4.2 and Section 3.1) to finally determine the distribution of potatoes in 2021 (Figure 5). The F1 score for the model in monitoring the potato distribution area was 0.95, as detailed in Table 3. This score, being close to 1, signifies that the model was adept at identifying the majority of true-positive samples while keeping the false alarm rate low. The analysis of the experimental outcomes demonstrated that the integration of the K-Means algorithm with the time-series data from Sentinel-1/2 was highly effective for spatial cluster analysis.

Four scenes were selected to illustrate the spatial details of the classification. The first row in Figure 6 shows real Google Earth images, while the second row displays the results of potato classification, with the green area representing the model’s identified distribution area of potatoes and black dots indicating the ground-measured distribution points. Figure 6a,b provides a detailed comparison of the classification results at different spatial resolutions: (a) shows the overall distribution of potato fields at a low spatial resolution, capturing broad patterns; while (b) highlights the model’s ability to identify smaller, more dispersed fields at a high spatial resolution, demonstrating its versatility in both broad and detailed classifications. Figure 6c shows the algorithm’s classification performance in mountainous regions, where errors are primarily due to terrain-induced shadows. Figure 6d displays the algorithm’s classification of areas with towns and water bodies, noting some remaining fragmented areas. Nevertheless, the algorithm effectively distinguishes between water bodies and vegetation near water, despite the similar morphological color characteristics of some farmland and water bodies in RGB images. By comparing with Google Earth images and field survey data, we find that the classification accuracy of the K-Means method is slightly decreased in some edge regions and mixed pixel regions, but the overall effect is small.

3.3. PLB Separability Results Based on Euclidean Distance

In this study, the Euclidean distance between the Sentinel-2 bands and PLB severity was used to quantify the distinguishability of spectral features. The color gradient of the heatmap ranges from white (low) to blue (high), representing the magnitude of the distances (Figure 7). The Euclidean distance matrix shows the distance between different Sentinel-2 bands. A smaller distance indicates that the two bands are more similar in terms of spectral features; a larger distance indicates that they are more different in terms of spectral features. As shown in Figure 7, in general, the spectra were more effective in distinguishing between different PLB severities when there was a large difference in disease severity. Moreover, it was observed that the spectral difference between PLB 85 and other PLB < 85 was particularly large.

3.4. The Characteristics and Importance of the PLB Surveillance Model

Feature importance analysis in Figure 8 shows that the top-ranked features in the model were mainly concentrated in the fourth month (August), which suggests that this period may be a critical stage for disease development, where physiological and morphological features of the vegetation are critical for PLB prediction. In particular, near-infrared bands (e.g., B8_3 and B8A_3) shows consistently high importance throughout the time series, highlighting the key role of vegetation biomass and health in disease prediction. The SAVI_3 and NDWI_3 indices are crucial in the fourth month, highlighting the role of vegetation water status in disease progression. The B2_1 band’s importance in the first month indicates early disease spectral characteristics for early detection. Other significant indicators include B7_3, B6_3, and DWSI3_3, which may relate to vegetation moisture and physiological activity. Vegetation texture features, like B8_savg_3, show the effects of disease on structure. However, features like MSK_CLDPRB, MSK_SNWPRB, and NDBI had zero importance at various times, suggesting they were less relevant for prediction.

3.5. Performance of PLB Surveillance Model

Table 4 compares seven models: CART, GTB, RF, TS–RF, STS–RF, MSTS–GTB, and MSTS–RF. The comparison results of the three machine learning models show that RF performs best in PLB monitoring and can be used as the main model for PLB monitoring research, followed by GTB. It can be seen from the experimental results that the STS–RF method using single source radar data has poor performance on the verification set, and the verification for R² is only 0.10, indicating that its prediction ability for new data is weak. In contrast, the TS–RF method using single-source optical data performed well on both the training set and the validation set, with a validation R² of 0.54, indicating that it has some predictive power for new data. MSTS–RF, which combines multi-source data with time series, has the best performance on multiple evaluation indexes, the lowest RMSE and the highest R², and has good prediction accuracy and generalization ability. These results suggest that MSTS–RF be prioritized as the final model, as it not only performs well in the training phase but also has the best adaptability to new data in actual deployments. The MSTS–RF model not only has high fitting accuracy to the training data but also has good generalization ability for the unseen validation data. In contrast, the MSTS–GTB model performed well on the training data (RMSE 16.53, R² 0.79), but poorly on the validation data (RMSE 30.50, R² 0.45).

The training set of the MSTS–RF model displayed in Figure 9 had good monitoring performance. The RMSE value was 14.69, which indicated the deviation between the percentage of PLB monitored by the model and the actual observations. The R² value was 0.89, which suggests that the model explains 89% of the variation in the actual observations. Figure 10 shows the performance of the PLB monitoring model on an independent validation set. The validation set provided a more rigorous test of the model’s generalization ability than the training set. The RMSE on the validation set was 20.50, which was higher than the RMSE of 14.69 on the training set. The R²on the validation set was 0.71, which was lower than the R² of 0.89 on the training set.

From the learning curve (Figure 11), we can observe the following: With the increase in the proportion of the training set, the RMSE of the training set decreases gradually, which indicates that the model fits the training data better and better. The RMSE of the validation set is the lowest when the ratio of training set is 0.70, which indicates that the generalization ability of the model is the strongest at this time. When the proportion of the training set decreases further, the RMSE of the validation set increases significantly, indicating that the generalization ability of the model decreases.

The R² of the training set is high at all scales, indicating that the model fits well on the training data. The R² of the validation set reaches the highest when the ratio of the training set is 0.70, indicating that the generalization ability of the model is the strongest at this time. When the proportion of the training set is reduced to 0.10, the R² of the validation set decreases significantly, indicating that the generalization ability of the model decreases significantly.

From these learning curves, we can draw the following conclusions: Optimal data partition ratio: When the training set ratio is 0.70, the model has the best performance on the validation set, the lowest RMSE and the highest R².

Overfitting and underfitting: When the proportion of the training sets is too high or too low, the model is prone to overfitting or underfitting. When the training set ratio is 0.90, the model performs well on the training set, but the performance on the validation set decreases, indicating overfitting. When the training set ratio is 0.10, the performance of the model on the validation set decreases significantly, indicating underfitting.

3.6. The Distribution of PLB Monitoring Results

This figure (Figure 12) shows the distribution of pixel values of PLB severity detected by the surveillance model. The histogram shows that the distribution of PLB values ranges from 4.75 to 68.25, indicating that the monitored PLB severity ranges from very mild to very severe. Most of the pixel values were clustered around 50, indicating that most of the monitored PLB severity was moderate. At the lower (near 4.75) and higher (near 68.25) ends of the PLB values, the number of pixels is lower, indicating that extremely mild or extremely severe PLB conditions are less common. Since most of the predicted values are concentrated in the early and middle stages, this suggests that the monitoring model has the potential to monitor PLB in its early stages, thus helping to take timely measures to control the development of the disease. Overall, this graph shows that the surveillance model is able to better identify moderate severity of PLB and monitor it at early stages, which is important for the management and control of potato late blight.

Figure 13 illustrates the severity of the disease monitored at various locations within the study area, with higher values denoting a greater disease risk as evaluated by the model. The distribution plot highlights substantial spatial heterogeneity in disease risk across the study area. High-risk areas, characterized by monitoring values nearing 80, tend to be clustered in particular geographical locations. This clustering may be associated with local environmental conditions, crop cultivation practices, or historical disease incidence patterns. The disease probability distribution map constitutes a potent tool for agricultural managers, facilitating the identification and prioritization of disease control measures. By targeting these high-risk areas, resources can be more efficiently allocated, thereby enhancing the efficacy and effectiveness of disease management strategies.

Most of the validation points in Figure 14 have small errors, with the majority of errors concentrated in lower intervals (below 25). Points with larger errors tend to be more spatially homogeneous. With an RMSE of 26.52, these results suggest that the model performs well.

4. Discussion

4.1. Sensitivity Features Analysis for PLB

In our study, through feature importance analysis, we identified several key remote sensing metrics that are critical for PLB prediction. Among them, the near-infrared band data (B8_3) in the fourth month (the month of actual data collection and the later period when the disease level develops more severely) topped the list, indicating that its central role in reflecting the physiological status of vegetation is crucial for accurate disease prediction. The NIR spectral band is instrumental in assessing the vitality of vegetation, as it is highly responsive to the biochemical composition of plant life [74,75], especially changes in chlorophyll content. This is closely followed by the Differential Vegetation Index (DVI_3), whose significance score demonstrates its significant impact in indicating changes in vegetation health [76,77]. The DVI is potentially valuable as an indicator of changes in the structure of plant leaves for recognizing early symptoms of disease. Vegetation texture mean (B8_savg_3) and Soil Adjusted Vegetation Index (SAVI_3) for the fourth month also showed high importance scores, emphasizing the role of vegetation structure and cover in disease development. Notably, the May blue band data (B2_1) received a high importance score. In the Random Forest model, feature importance is determined using mean impurity decrease or out-of-bag mean decrease in accuracy [78]. The blue band’s significant importance could stem from its sensitivity to disease indicators [79], such as chlorophyll changes or leaf structure, which are particularly evident in early disease stages. Additionally, the blue band might interact with other features to enhance model predictions. The model’s stochastic nature could also account for variations in the importance scores of different features [80]. In addition, the significance scores of the mid-infrared band data (B7_3 and B6_3) in the fourth month suggest that they may be related to changes in spectral properties caused by physiological activity or disease in the vegetation. The importance scores for the Differential Water Index (DWSI3_3) and the Normalized Differential Water Index (NDWI_3) in the fourth month significantly indicated the influence of vegetation water status on the risk of disease development. These metrics are crucial for monitoring plant water stress and water use efficiency and may reflect plant responses to disease stress. These findings not only aid in identifying critical periods for disease management but also optimize the model structure, thereby enhancing prediction accuracy. Moreover, they provide a scientific basis for long-term monitoring and identification of cyclical disease patterns.

4.2. The PLB Monitoring Model Based on Multidimensional Data

Prior findings have demonstrated the enhanced efficacy of time-series remote sensing data for disease surveillance [81,82,83,84]. Our investigation harnessed the robust computational framework of GEE to synthesize temporal sequences of remote sensing data derived from an array of satellite-based sensors, thereby generating a comprehensive multi-dimensional dataset that encompasses extensive spectral characteristics, disease-related data, and intricate temporal and spatial dynamics. This multidimensional data has 200 bands and contains a large amount of effective information related to PLB. Specifically, the proposed MSTS–RF model improves the R² by about 43% (from 0.46 to 0.89) on the training set and about 59% (from 0.12 to 0.71) on the validation set compared to the CART model. This enhancement demonstrates the significant advantage of time series models in capturing the dynamic properties and patterns of data, especially with time-dependent data. Our methodology for overseeing PLB within the GEE environment is both viable and revelatory, shedding new light on the potential for enhanced synergy between time-series remote sensing data and machine learning algorithms. This approach paves the way for innovative advancements in the realm of agricultural disease monitoring systems.

4.3. The Model Performance for Early PLB Monitoring

The MSTS–RF model developed in this study showcased robust performance, particularly in the training phase, with a lower RMSE and a higher R², indicating its proficiency in modeling the probability distribution of PLB occurrences. However, a slight increase in RMSE and a decrease in R² on the validation set suggest that the model may require further tuning to enhance its generalization capabilities in new, unseen areas. Despite these nuances, the model’s strength in focusing monitoring values within low-risk intervals for a significant proportion of samples underscores its potential for early disease detection. This is especially significant for large-scale agricultural remote sensing, where the subtlety of early disease symptoms can often go undetected [16,42,85]. The model’s ability to perform well across expansive regions is a substantial advancement, providing a foundation for further improvements aimed at bolstering its reliability and precision in real-world applications. Continuous refinement will be essential to adapt the model to diverse environmental conditions and enhance its predictive accuracy on a broader scale.

4.4. Limitations of the Model and Directions for Improvement

The decline in the model’s generalization ability on the validation set indicates that more feature engineering or algorithm adjustments may be needed. Future work may enhance the model’s generalization and robustness by introducing more remote sensing image features, such as feature changes on different time scales, or by combining environmental variables, such as meteorological data [86]. Another limitation of this study is that the model mainly relies on changes in spectral information [87,88], potentially overlooking other factors affecting the occurrence of PLB, such as soil characteristics and crop variety resistance [89]. A possible solution is that of studying the impact of other environmental factors, such as meteorological conditions and soil characteristics, on the occurrence of PLB and integrating them into the model. In addition, the validation of the model is limited to achieving good results in the region selected in the current study, and there are challenges in whether it can perform well in disease monitoring under different geographical and climatic conditions or in a larger range. Future research can further improve the applicability and accuracy of the model by increasing field verification data and validating the model in different regions. In addition, although the current deep learning method has great potential in remote sensing image processing, the current field data are limited, and the driving effect of the deep learning method is especially insufficient. Specifically, deep learning methods usually require a large amount of training data to learn effective feature representations and model parameters [61,90], and our data volume may not cover enough variability and complexity, resulting in less comprehensive features learned by the model, and gradient updates during training may also be unstable, affecting the convergence of the model. Therefore, we choose traditional machine learning methods for research, which are more stable on small data sets and can achieve better performance through feature engineering and model tuning. In the future, as the amount of data increases, we will explore deep learning methods and make more comprehensive comparisons.

5. Conclusions

In this study, we introduced an innovative approach for large-scale monitoring of PLB using time-series data from Sentinel-1/2 satellites on the GEE platform. This method combined the K-Means clustering algorithm with the RF algorithm, comparing seven distinct algorithmic combinations to predict PLB severity and visualize the spatial distribution of affected areas at a resolution of 10 m. Notably, the proposed MSTS–RF model, which incorporates phenological information, demonstrated superior performance on the training dataset, with a lower RMSE and a higher R². Even on an independent validation dataset, although the RMSE increased to 20.50 and the R² decreased to 0.71, the RF algorithm maintained strong generalization, achieving an F1 score of 0.95 for mapping potato distribution. The generated PLB distribution map provided valuable insights into spatial disease risk, supporting targeted agricultural management. However, the study’s limitations include the need for further feature engineering and validation in diverse regions. Future work will focus on incorporating additional environmental data and exploring deep learning methods to enhance model robustness and accuracy. Overall, this study demonstrated a practical and effective approach for large-scale PLB monitoring, contributing to sustainable agricultural management and disease control.

Author Contributions

Conceptualization, S.C. and Z.C.; Methodology, S.C., Z.C. and H.C.; Software, Z.C. and H.C.; Investigation and Analysis, S.C., H.C. and Z.-L.L.; Writing—Original Draft, Z.C. and S.C.; Writing—Review and Editing, L.M., K.X. and Z.Z.; Resources, T.H.; Supervision and Funding Acquisition, S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 42271406), the National Key R&D Program of China (No. 2017YFE0122700) and the Agricultural Artificial Intelligence for Workforce Transformation and Decision Support (AgAID) (No. 2021-67021-35344).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the anonymous reviewers for their careful reading of our manuscript and their comments and suggestions.

Conflicts of Interest

Author Zhenjie Zhao is employed by the Sinochem Modern Agriculture Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. NDVI feature in each classification.

Category	NDVI Feature	Classification
1	Low and relatively stable	Urban areas or water bodies
2–4	High, with seasonal variation	Good vegetation coverage
5	Extremely high in the 10th month	Abnormally high vegetation growth or biomass
6–7	Generally high, exceptionally high in the 10th month	Abundant vegetation or agricultural activities
8–10	High with fluctuations	Natural growth cycles or human activities
11–12	Low, extremely low in the 7th month	Little or no vegetation coverage
13	Stable, no seasonal variation	Relatively consistent vegetation coverage
14–19	Pronounced seasonal changes	Increase during the growing season, decline in autumn and winter
15–19	Significant jump in the 6th and 7th months	Rapid growth or climatic events
18–20	Lowest in the 12th month	Less vegetation coverage in winter

Table A2. VH feature in each classification.

Category	VH Feature	Classification
1	Average value −25.5 dB, no seasonal variation	Urban areas
2	Average value −25.1 dB, with seasonal fluctuations	Agricultural irrigation and crop growth
3	Higher average value	Areas with more vegetation or rough soil types
4–6	Pronounced seasonal variation	Vegetation growth cycles
7–9	Long-term increasing trend	Increase in surface moisture content
10–12	Lower average value	Urban or rocky terrain
13–15	Anomaly in the 12th month	Extreme climatic events
16–18	Minor fluctuations	Relatively stable surface conditions
19–20	Significant increase at year-end	Seasonal vegetation changes or wetland moisture variations

Table A3. Combined characteristics of different land cover types.

Category	Land Cover Types	Characteristics	Description
1, 10–12	Urban Areas	Low NDVI and VH values	Urban regions with smooth surfaces and sparse vegetation
2–4, 6–7	Agricultural Areas	Higher NDVI values with seasonal VH fluctuations	Farmlands, influenced by irrigation and crop cycles
14–19	Natural Vegetation Areas	Pronounced seasonal NDVI variations and VH fluctuations	Areas with natural vegetation, affected by growth cycles and climate
13–15	Special Event-Affected Areas	Anomalies in VH during certain months	Regions impacted by extreme weather like floods or droughts
16–18	Stable Areas	Minor VH fluctuations	Areas with stable conditions, consistent vegetation or terrain
19–20	Wetland or Seasonally Variable Areas	Significant increase in VH values at year-end	Wetlands or areas with seasonal moisture variations

References

Liu, B.; Gu, W.; Yang, Y.; Lu, B.; Wang, F.; Zhang, B.; Bi, J. Promoting potato as staple food can reduce the carbon–land–water impacts of crops in China. Nat. Food 2021, 2, 570–577. [Google Scholar] [CrossRef]
Scherer, L.; Huang, J. Overlooked benefits of a staple food transition. Nat. Food 2021, 2, 557–558. [Google Scholar] [CrossRef]
Gao, C.; Xu, H.; Huang, J.; Sun, B.; Zhang, F.; Savage, Z.; Duggan, C.; Yan, T.; Wu, C.; Wang, Y. Pathogen manipulation of chloroplast function triggers a light-dependent immune recognition. Proc. Natl. Acad. Sci. USA 2020, 117, 202002759. [Google Scholar] [CrossRef]
Sparks, A.H.; Forbes, G.A.; Hijmans, R.J.; Garrett, K.A. Climate change may have limited effect on global risk of potato late blight. Glob. Change Biol. 2014, 20, 3621–3631. [Google Scholar] [CrossRef]
Benedetti, R. Agricultural Survey Methods; Wiley Online Library: Hoboken, NJ, USA, 2010. [Google Scholar]
Bienkowski, D.; Aitkenhead, M.J.; Lees, A.K.; Gallagher, C.; Neilson, R. Detection and differentiation between potato (Solanum tuberosum) diseases using calibration models trained with non-imaging spectrometry data. Comput. Electron. Agric. 2019, 167, 105056. [Google Scholar] [CrossRef]
Kool, J.; Evenhuis, A. Early detection of Phytophthora infestans in potato plants using hyperspectral imaging, local comparison and a convolutional neural network. Smart Agric. Technol. 2023, 6, 100333. [Google Scholar] [CrossRef]
Gold, K.M.; Townsend, P.A.; Larson, E.R.; Herrmann, I.; Gevens, A.J. Contact Reflectance Spectroscopy for Rapid, Accurate, and Nondestructive Phytophthora infestans Clonal Lineage Discrimination. Phytopathology 2020, 110, 851–862. [Google Scholar] [CrossRef]
Sugiura, R.; Tsuda, S.; Tamiya, S.; Itoh, A.; Nishiwaki, K.; Murakami, N.; Shibuya, Y.; Hirafuji, M.; Nuske, S. Field phenotyping system for the assessment of potato late blight resistance using RGB imagery from an unmanned aerial vehicle. Biosyst. Eng. 2016, 148, 1–10. [Google Scholar] [CrossRef]
Franceschini, M.H.D.; Bartholomeus, H.; Van Apeldoorn, D.F.; Suomalainen, J.; Kooistra, L. Feasibility of Unmanned Aerial Vehicle Optical Imagery for Early Detection and Severity Assessment of Late Blight in Potato. Remote Sens. 2019, 11, 224. [Google Scholar] [CrossRef]
Shi, Y.; Han, L.; Kleerekoper, A.; Chang, S.; Hu, T. Novel CropdocNet Model for Automated Potato Late Blight Disease Detection from Unmanned Aerial Vehicle-Based Hyperspectral Imagery. Remote Sens. 2022, 14, 396. [Google Scholar] [CrossRef]
Duarte-Carvajalino, J.; Alzate, D.; Ramirez, A.; Santa-Sepulveda, J.; Fajardo-Rojas, A.; Soto-Suárez, M. Evaluating Late Blight Severity in Potato Crops Using Unmanned Aerial Vehicles and Machine Learning Algorithms. Remote Sens. 2018, 10, 1513. [Google Scholar] [CrossRef]
Atzberger, C. Advances in Remote Sensing of Agriculture: Context Description, Existing Operational Monitoring Systems and Major Information Needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef]
Chang, S.; Chi, Z.; Chen, H.; Hu, T.; Gao, C.; Meng, J.; Han, L. Development of a Multiscale XGBoost-Based Model for Enhanced Detection of Potato Late Blight Using Sentinel-2, UAV, and Ground Data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4415014. [Google Scholar] [CrossRef]
Wu, B.; Zhang, M.; Zeng, H.; Tian, F.; Potgieter, A.B.; Qin, X.; Yan, N.; Chang, S.; Zhao, Y.; Dong, Q.; et al. Challenges and opportunities in remote sensing-based crop monitoring: A review. Natl. Sci. Rev. 2022, 10, nwac290. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Huang, Y.; Pu, R.; Gonzalez-Moreno, P.; Yuan, L.; Wu, K.; Huang, W. Monitoring plant diseases and pests through remote sensing technology: A review. Comput. Electron. Agric. 2019, 165, 104943. [Google Scholar] [CrossRef]
Gupta, A.; Chug, A.; Singh, A.P. Potato disease prediction using machine learning, image processing and IoT—A systematic literature survey. J. Crop Improv. 2024, 38, 95–137. [Google Scholar] [CrossRef]
Bégué, A.; Arvor, D.; Bellon, B.; Betbeder, J.; De Abelleyra, D.; PD Ferraz, R.; Lebourgeois, V.; Lelong, C.; Simões, M.; R. Verón, S. Remote Sensing and Cropping Practices: A Review. Remote Sens. 2018, 10, 99. [Google Scholar] [CrossRef]
Nilsson, H. Remote Sensing and Image Analysis in Plant Pathology. Annu. Rev. Phytopathol. 1995, 33, 489–528. [Google Scholar] [CrossRef]
Liu, W.; Zhang, H. Mapping annual 10 m rapeseed extent using multisource data in the Yangtze River Economic Belt of China (2017–2021) on Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103198. [Google Scholar] [CrossRef]
Asadi, B.; Shamsoddini, A. Crop mapping through a hybrid machine learning and deep learning method. Remote Sens. Appl. Soc. Environ. 2024, 33, 101090. [Google Scholar] [CrossRef]
You, N.; Dong, J.; Li, J.; Huang, J.; Jin, Z. Rapid early-season maize mapping without crop labels. Remote Sens. Environ. 2023, 290, 113496. [Google Scholar] [CrossRef]
Li, H.; Song, X.-P.; Hansen, M.C.; Becker-Reshef, I.; Adusei, B.; Pickering, J.; Wang, L.; Wang, L.; Lin, Z.; Zalles, V.; et al. Development of a 10-m resolution maize and soybean map over China: Matching satellite-based crop classification with sample-based area estimation. Remote Sens. Environ. 2023, 294, 113623. [Google Scholar] [CrossRef]
Fatchurrachman; Rudiyanto; Soh, N.C.; Shah, R.M.; Giap, S.G.E.; Setiawan, B.I.; Minasny, B. High-Resolution Mapping of Paddy Rice Extent and Growth Stages across Peninsular Malaysia Using a Fusion of Sentinel-1 and 2 Time Series Data in Google Earth Engine. Remote Sens. 2022, 14, 1875. [Google Scholar] [CrossRef]
Frate, F.D.; Schiavon, G.; Solimini, D.; Borgeaud, M.; Hoekman, D.H.; Vissers, M.A.M. Crop classification using multiconfiguration C-band SAR data. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1611–1619. [Google Scholar] [CrossRef]
Mandal, D.; Kumar, V.; Ratha, D.; Dey, S.; Bhattacharya, A.; Lopez-Sanchez, J.M.; McNairn, H.; Rao, Y.S. Dual polarimetric radar vegetation index for crop growth monitoring using sentinel-1 SAR data. Remote Sens. Environ. 2020, 247, 111954. [Google Scholar] [CrossRef]
Panigrahy, S.; Manjunath, K.R.; Chakraborty, M.; Kundu, N.; Parihar, J.S. Evaluation of RADARSAT Standard Beam data for identification of potato and rice crops in India. ISPRS J. Photogramm. Remote Sens. 1999, 54, 254–262. [Google Scholar] [CrossRef]
Adrian, J.; Sagan, V.; Maimaitijiang, M. Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 175, 215–235. [Google Scholar] [CrossRef]
Meroni, M.; d’Andrimont, R.; Vrieling, A.; Fasbender, D.; Lemoine, G.; Rembold, F.; Seguini, L.; Verhegghen, A. Comparing land surface phenology of major European crops as derived from SAR and multispectral data of Sentinel-1 and-2. Remote Sens. Environ. 2021, 253, 112232. [Google Scholar] [CrossRef]
Blickensdörfer, L.; Schwieder, M.; Pflugmacher, D.; Nendel, C.; Erasmi, S.; Hostert, P. Mapping of crop types and crop sequences with combined time series of Sentinel-1, Sentinel-2 and Landsat 8 data for Germany. Remote Sens. Environ. 2022, 269, 112831. [Google Scholar] [CrossRef]
Veloso, A.; Mermoz, S.; Bouvet, A.; Toan, T.L.; Planells, M.; Dejoux, J.F.; Ceschia, E. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
Vijaywargiya, J.; Nidamanuri, R.R. Crop Phenology Extraction Using Big Geospatial Datacube. In Proceedings of the 2023 International Conference on Machine Intelligence for Geoanalytics and Remote Sensing, MIGARS, Hyderabad, India, 27–29 January 2023; pp. 218–221. [Google Scholar]
Shelestov, A.; Lavreniuk, M.; Kussul, N.; Novikov, A.; Skakun, S. Exploring Google Earth Engine Platform for Big Data Processing: Classification of Multi-Temporal Satellite Imagery for Crop Mapping. Front. Earth Sci. 2017, 5, 232994. [Google Scholar] [CrossRef]
Liu, L.; Xiao, X.M.; Qin, Y.W.; Wang, J.; Xu, X.L.; Hu, Y.M.; Qiao, Z. Mapping cropping intensity in China using time series Landsat and Sentinel-2 images and Google Earth Engine. Remote Sens. Environ. 2020, 239, 111624. [Google Scholar] [CrossRef]
You, N.S.; Dong, J.W. Examining earliest identifiable timing of crops using all available Sentinel 1/2 imagery and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 161, 109–123. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Hjelkrem, A.-G.R.; Eikemo, H.; Le, V.H.; Hermansen, A.; Nærstad, R. A process-based model to forecast risk of potato late blight in Norway (The Nærstad model): Model development, sensitivity analysis and Bayesian calibration. Ecol. Model. 2021, 450, 109565. [Google Scholar] [CrossRef]
Forbes, G.A.; Pérez, W.; Andrade-Piedra, J.L. Field Assessment of Resistance in Potato to Phytophthora Infestans: International Cooperators Guide; International Potato Center: Lima, Peru, 2014. [Google Scholar]
Iglesias, I.; Escuredo, O.; Seijo, C.; Méndez, J. Phytophthora infestans Prediction for a Potato Crop. Am. J. Potato Res. 2010, 87, 32–40. [Google Scholar] [CrossRef]
James, C. A Manual of Assessment Keys for Plant Diseases; Amer Phytopathological Society: St. Paul, MN, USA, 1971. [Google Scholar]
Behmann, J.; Mahlein, A.K.; Rumpf, T.; Römer, C.; Plümer, L. A review of advanced machine learning methods for the detection of biotic stress in precision crop protection. Precis. Agric. 2015, 16, 239–260. [Google Scholar] [CrossRef]
Dutta, S.; Singh, S.K.; Panigrahy, S. Assessment of Late Blight Induced Diseased Potato Crops: A Case Study for West Bengal District Using Temporal AWiFS and MODIS Data. J. Indian Soc. Remote Sens. 2014, 42, 353–361. [Google Scholar] [CrossRef]
Kundu, R.; Dutta, D.; Nanda, M.K.; Chakrabarty, A. Near Real Time Monitoring of Potato Late Blight Disease Severity using Field Based Hyperspectral Observation. Smart Agric. Technol. 2021, 1, 100019. [Google Scholar] [CrossRef]
Elvidge, C.D.; Chen, Z.K. Comparison of broad-band and narrow-band red and near-infrared vegetation indexes. Remote Sens. Environ. 1995, 54, 38–48. [Google Scholar] [CrossRef]
Ali, M.; Zafar, S.; Ashraf, M.Y. Assessment of maize cultivars for salt tolerance based on physiological indices. Pak. J. Bot. 2022, 54, 1613–1618. [Google Scholar] [CrossRef]
Apan, A.; Held, A.; Phinn, S.; Markley, J. Detecting sugarcane ‘orange rust’ disease using EO-1 Hyperion hyperspectral imagery. Int. J. Remote Sens. 2004, 25, 489–498. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [Google Scholar] [CrossRef]
Gao, B.C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Xu, H. A new index for delineating built-up land features in satellite imagery. Int. J. Remote Sens. 2008, 29, 4269–4276. [Google Scholar] [CrossRef]
Roth, N.E.; Allan, J.D.; Erickson, D.L. Landscape influences on stream biotic integrity assessed at multiple spatial scales. Landsc. Ecol. 1996, 11, 141–156. [Google Scholar] [CrossRef]
Hashemi, M.G.Z.; Jalilvand, E.; Alemohammad, H.; Tan, P.-N.; Das, N.N. Review of synthetic aperture radar with deep learning in agricultural applications. ISPRS J. Photogramm. Remote Sens. 2024, 218, 20–49. [Google Scholar] [CrossRef]
Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [Google Scholar] [CrossRef]
Tassi, A.; Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine Learning Algorithms. Remote Sens. 2020, 12, 3776. [Google Scholar] [CrossRef]
Sethy, P.K.; Barpanda, N.K.; Rath, A.K.; Behera, S.K. Deep feature based rice leaf disease identification using support vector machine. Comput. Electron. Agric. 2020, 175, 105527. [Google Scholar] [CrossRef]
Hartingan, J.; Wong, M.K. Algorithm AS136: A k-means clustering algorithm. Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
Pedersen, K.; Jensen, R.R.; Hall, L.K.; Cutler, M.C.; Transtrum, M.K.; Gee, K.L.; Lympany, S.V. K-Means Clustering of 51 Geospatial Layers Identified for Use in Continental-Scale Modeling of Outdoor Acoustic Environments. Appl. Sci. 2023, 13, 8123. [Google Scholar] [CrossRef]
Sinaga, K.P.; Yang, M.S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
Qi, C.; Sandroni, M.; Cairo Westergaard, J.; Høegh Riis Sundmark, E.; Bagge, M.; Alexandersson, E.; Gao, J. In-field classification of the asymptomatic biotrophic phase of potato late blight based on deep learning and proximal hyperspectral imaging. Comput. Electron. Agric. 2023, 205, 107585. [Google Scholar] [CrossRef]
Sun, H.; Song, X.; Guo, W.; Guo, M.; Mao, Y.; Yang, G.; Feng, H.; Zhang, J.; Feng, Z.; Wang, J.; et al. Potato late blight severity monitoring based on the relief-mRmR algorithm with dual-drone cooperation. Comput. Electron. Agric. 2023, 215, 108438. [Google Scholar] [CrossRef]
Dutta, A.; Tyagi, R.; Chattopadhyay, A.; Chatterjee, D.; Sarkar, A.; Lall, B.; Sharma, S. Early detection of wilt in Cajanus cajan using satellite hyperspectral images: Development and validation of disease-specific spectral index with integrated methodology. Comput. Electron. Agric. 2024, 219, 108784. [Google Scholar] [CrossRef]
Hou, B.R.; Hu, Y.H.; Zhang, P.; Hou, L.X. Potato Late Blight Severity and Epidemic Period Prediction Based on Vis/NIR Spectroscopy. Agriculture 2022, 12, 897. [Google Scholar] [CrossRef]
Wan, L.; Li, H.; Li, C.; Wang, A.; Yang, Y.; Wang, P. Hyperspectral Sensing of Plant Diseases: Principle and Methods. Agronomy 2022, 12, 1451. [Google Scholar] [CrossRef]
Shtienberg, D.; Blachinsky, D.; Ben-hador, G.; Dinoor, A. Effects of growing season and fungicide type on the development of Alternaria solani and on potato yield. Plant Dis. 1996, 80, 994–998. [Google Scholar] [CrossRef]
Bawden, F.C. Infra-Red Photography and Plant Virus Diseases. Nature 1933, 132, 168. [Google Scholar] [CrossRef]
Araujo-Carrillo, G.A.; Duarte-Carvajalino, J.M.; Sánchez-Vivas, D.F.; Góez-Vinasco, G.A.; Castaño-Marín, Á.M. Using SAR-based products to calculate potato carbon uptake in a tropical Andean region. Eur. J. Remote Sens. 2024, 57, 2339272. [Google Scholar] [CrossRef]
Ray, S.S.; Jain, N.; Arora, R.K.; Chavan, S.; Panigrahy, S. Utility of Hyperspectral Data for Potato Late Blight Disease Detection. J. Indian Soc. Remote Sens. 2011, 39, 161–169. [Google Scholar] [CrossRef]
Pérez, W.; Forbes, G.A. Potato Late Blight: Technical Manual; International Potato Center: Lima, Peru, 2010. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lewis, R.J. An introduction to classification and regression tree (CART) analysis. In Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine, San Francisco, CA, USA, 22–25 May 2000. [Google Scholar]
Truong, V.-H.; Vu, Q.-V.; Thai, H.-T.; Ha, M.-H. A robust method for safety evaluation of steel trusses using Gradient Tree Boosting algorithm. Adv. Eng. Softw. 2020, 147, 102825. [Google Scholar] [CrossRef]
Qian, B.; Ye, H.; Huang, W.; Xie, Q.; Pan, Y.; Xing, N.; Ren, Y.; Guo, A.; Jiao, Q.; Lan, Y. A sentinel-2-based triangular vegetation index for chlorophyll content estimation. Agric. For. Meteorol. 2022, 322, 109000. [Google Scholar] [CrossRef]
Almalki, R.; Khaki, M.; Saco, P.M.; Rodriguez, J.F. Monitoring and Mapping Vegetation Cover Changes in Arid and Semi-Arid Areas Using Remote Sensing Technology: A Review. Remote Sens. 2022, 14, 5143. [Google Scholar] [CrossRef]
Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. J. For. Res. 2021, 32, 1–6. [Google Scholar] [CrossRef]
Zeng, Y.; Hao, D.; Huete, A.; Dechant, B.; Berry, J.; Chen, J.M.; Joiner, J.; Frankenberg, C.; Bond-Lamberty, B.; Ryu, Y.; et al. Optical vegetation indices for monitoring terrestrial ecosystems globally. Nat. Rev. Earth Environ. 2022, 3, 477–493. [Google Scholar] [CrossRef]
Louppe, G.; Wehenkel, L.; Sutera, A.; Geurts, P. Understanding variable importances in forests of randomized trees. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 1, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 431–439. [Google Scholar]
Mishra, P.; Polder, G.; Vilfan, N. Close Range Spectral Imaging for Disease Detection in Plants Using Autonomous Platforms: A Review on Recent Studies. Curr. Robot. Rep. 2020, 1, 43–48. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Sakamoto, T.; Nguyen, N.V.; Kotera, A.; Ohno, H.; Ishitsuka, N.; Yokozawa, M. Detecting temporal changes in the extent of annual flooding within the Cambodia and the Vietnamese Mekong Delta from MODIS time-series imagery. Remote Sens. Environ. 2007, 109, 295–313. [Google Scholar] [CrossRef]
Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine Learning Classification of Mediterranean Forest Habitats in Google Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation. Remote Sens. 2021, 13, 586. [Google Scholar] [CrossRef]
Pickens, A.H.; Hansen, M.C.; Hancher, M.; Stehman, S.V.; Tyukavina, A.; Potapov, P.; Marroquin, B.; Sherani, Z. Mapping and sampling to characterize global inland water dynamics from 1999 to 2018 with full Landsat time-series. Remote Sens. Environ. 2020, 243, 111792. [Google Scholar] [CrossRef]
Lee, Y.-H.; Wei, C.-P.; Cheng, T.-H.; Yang, C.-T. Nearest-neighbor-based approach to time-series classification. Decis. Support Syst. 2012, 53, 207–217. [Google Scholar] [CrossRef]
Davis, E.C. A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 2010, 72, 1–13. [Google Scholar]
Baker, K.M.; Lake, T.; Roehsner, P.; Schrantz, K. Forecasting Disease with 10-Year Optimized Models: Moving Toward New Digital Datasets. In Proceedings of the 2012 First International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Shanghai, China, 2–4 August 2012; pp. 400–403. [Google Scholar]
da Silva, J.R.M.; Damásio, C.V.; Sousa, A.M.O.; Bugalho, L.; Pessanha, L.; Quaresma, P. Agriculture pest and disease risk maps considering MSG satellite data and land surface temperature. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 40–50. [Google Scholar] [CrossRef]
Hazra, J.; Padmanaban, M. Sustainable farming—A spatio-temporal adaptation of late blight disease prediction using multi-modal data. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 313–316. [Google Scholar]
Baker, K.M.; Nogueira, R. Climate Variability and the Daily Agroecosystem Forecasting Accuracy Plateau. In Proceedings of the 2012 First International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Shanghai, China, 2–4 August 2012; pp. 395–399. [Google Scholar]
Li, X.T.; Zhang, F.; Feng, J. Convolutional Neural Network Combined with Improved Spectral Processing Method for Potato Disease Detection. Spectrosc. Spectr. Anal. 2024, 44, 215–224. [Google Scholar] [CrossRef]

Figure 1. This map illustrates the study area. These regions are characterized by a temperate continental grassland climate, with high temperatures, abundant rainfall, and high humidity during the summer, making them suitable for potato cultivation but also conducive to the development of PLB. The map provides an overview of the geographical context for the research.

Figure 2. Roadmap of research methods. Step 1. Integrate Sentinel-1 and Sentinel-2 time series data, conduct field surveys and collect PLB data. Step 2. Calculate vegetation, moisture, soil indices and radar and texture features. Step 3. Use K-Means clustering and field data for PLB identification to improve efficiency and identify potato growing areas. Step 4. Evaluation of PLB-remote sensing mechanism relationships by Euclidean distance and feature importance analysis. Step 5. Compare multiple machine learning models and select MSTS–RF model for PLB monitoring. Step 6. Generate PLB distribution map and analyze model error.

Figure 3. Time series spatial clustering results of NDVI and VH. The 20 colors in the figure show spatially separate results for their clusters. Each cluster reflects variations in vegetation health, crop growth stages, and soil moisture content over the study period. Clustering helps in identifying spatial patterns and distinguishing between different land use types, such as agricultural fields, urban areas, and natural vegetation.

Figure 4. Clustering time curves of NDVI and VH. This figure displays the temporal evolution of NDVI (Normalized Difference Vegetation Index) and VH (Sentinel-1 radar backscatter in VH polarization) values for different clusters identified through K-Means clustering. The curves illustrate how vegetation health and radar backscatter change over time, reflecting seasonal variations and land cover dynamics. This analysis helps in understanding the temporal patterns associated with different land cover types and vegetation conditions.

Figure 5. Spatial distribution results of potatoes and their validation performance based on ground data. This map shows the spatial distribution of potato fields identified using the K-Means clustering algorithm and multi-source remote sensing data. The validation performance is assessed by comparing the model’s predictions with ground-truth data collected from field surveys. The results demonstrate the accuracy and reliability of the potato distribution map, highlighting the effectiveness of the proposed method in identifying potato planting areas.

Figure 6. Detailed view of the four scene categories. Figure 6 shows the classification results for four scene categories. It compares actual Google Earth images with model outputs, where green areas indicate potato fields and black dots are ground-measured points. The figure highlights model performance in different landscapes: plains, mountains, and areas with water bodies or towns. (a) (low resolution) shows overall field distribution, while (b) (high resolution) identifies smaller fields. In (c) (mountains), errors are mainly due to shadows. (d) (towns and water bodies) shows good distinction between water and vegetation, though some fragmentation remains.

Figure 7. PLB separability results based on Euclidean distance. This figure illustrates the separability of different PLB severity levels using Euclidean distance calculated from Sentinel-2 spectral bands. The labels “PLB: 0” to “PLB: 100” represent the severity levels of the disease, with “PLB: 0” indicating no disease symptoms and “PLB: 100” indicating complete wilting or the most severe form of the disease. These levels are based on the field assessment of disease severity, where higher values correspond to more severe infection. The heatmap shows the spectral differences between these severity levels, helping to evaluate how well the remote sensing data can distinguish between varying degrees of PLB.

Figure 8. Feature Importance Chart by Random Forest (top ten). This figure illustrates the distribution of importance scores for the 200 features in the MSTS–RF model. The features are ranked in descending order of importance scores, with the TCI_B_1 (blue-channel true-color image synthesized in June) feature ranking 11th in terms of importance score. The pie chart analyzes in detail the relative importance of the top ten features with scores higher than TCI_B_1. The 200 features that trained the model cover a wide range of types, including optical bands, radar bands, quality control and mask features, as well as synthetic and texture features, and are organized into time series.

Figure 9. Distribution of true vs. predictions for training set of the MSTS–RF model. This figure shows a scatterplot of the actual disease severity values (true values) compared to the predicted values of the training dataset generated by the MSTS–RF model. The black line represents the diagonal line, while the red line represents the linear fit line. The plot helps to visually assess the performance of the model by showing how closely the predicted values correlate with the observed data. The good fit between the true and predicted values indicates that the model has high accuracy and robustness during the training process.

Figure 10. Distribution of true vs. predictions for the validation set of the MSTS–RF model. This figure shows a scatterplot of the actual disease severity values (true values) compared to the predicted values of the validation dataset generated by the MSTS–RF model. The black line represents the diagonal line, while the red line represents the linear fit line. The plot helps to visually assess the performance of the model by showing how closely the predicted values correlate with the observed data. The good fit between the true and predicted values indicates that the model has high accuracy and robustness during the training process.

Figure 11. Model Performance Learning Curves. This figure illustrates the learning curves for the MSTS–RF model, showing the Root Mean Square Error (RMSE) and Coefficient of Determination (R²) for both the training and validation sets across different training set sizes. The learning curves provide insights into the model’s performance and its ability to generalize to unseen data.

Figure 12. The pixel value distribution of PLB monitoring results. This figure shows the distribution of pixel values representing PLB severity as detected by the surveillance model. The histogram shows the frequency of different severities, indicating the range and concentration of disease intensity across the study area. The results showed that most of the predicted values were concentrated in the early and middle stages of the disease, suggesting that the model has the potential to monitor the disease at an early stage.

Figure 13. PLB distribution in the study area. This map displays the spatial distribution of PLB severity across the study area, visualized through a color-coded representation of disease risk levels. Higher values indicate greater disease severity. The map highlights areas with high disease risk, providing valuable information for agricultural managers to prioritize disease control measures and allocate resources efficiently.

Figure 14. Spatial error distribution of the MSTS–RF model in validation points. This figure illustrates the spatial distribution of errors between the predicted and actual values of PLB severity at validation points across the study area. The map provides a visual assessment of the model’s accuracy, highlighting areas where the model’s predictions closely match the observed data (low error) and areas where discrepancies are higher (high error).

Table 1. Methods for identifying the degree of PLB in field measurement.

PLB Severity (%)	Description
0.0%: No symptoms	No disease symptoms observed.
0.1%: Initial infection	Initial sporangium lesions were detected.
1.0%: Mild infection	Each plant has 5–10 plaques.
5.0%: Moderate infection	1/10 of the leaves were affected, with 50 spots per plant.
25.0%: High infection rate	Almost every leaf fragment is affected and the plant morphology is normal.
50.0%: Severe infection	Each plant is affected and 50% of its leaf area is destroyed.
75.0%: Advanced infection	75% of the leaf area is destroyed and the field color is changed.
95.0%: Severe wilting	There are only a few leaves left on the plant.
100.0%: Complete wilting	The leaves are completely necrotic and the stems are dead or dying.

Table 2. List of indices calculated based on Sentinel-2 optical bands.

Remote Sensing Index	Description
$N D V I = \frac{N I R - R E D}{N I R + R E D}$	Normalized vegetation index, reflecting vegetation growth [43,44]
$N D W I = \frac{G R E E N - N I R}{G R E E N + N I R}$	Normalized moisture index for water detection [50]
$N D B I = \frac{S W I R - N I R}{S W I R + N I R}$	Normalized building index for urban area detection [51]
$S A V I = \frac{(N I R - R E D) \times (1 + 0.5)}{N I R + R E D + 0.5}$	Soil adjustment vegetation index, taking into account soil influences [48]
$I B I A = 2 \times \frac{S W I R 1}{S W I R 1 + N I R}$ $I B I B = \frac{N I R}{(N I R + R E D)} + \frac{G R E E N}{G R E E N + S W I R 1}$ $I B I = \frac{I B I A - I B I B}{I B I A + I B I B}$	Construction index, for detection of urban areas [52]
$R V I = \frac{N I R}{R E D}$	The ratio vegetation index reflects the relative health status of vegetation [45]
$D V I = N I R - R E D$	The Difference Vegetation Index, which represents simple differences in vegetation [45]
$N D C I = \frac{N I R - R E D}{N I R + R E D}$	Normalized canopy chlorophyll index to estimate chlorophyll content [49]
$D W S I 2 = \frac{B 6 - G R E E N}{S W I R 1 + R E D}$	Disease water stress index is very sensitive to changes in leaf pigment, internal leaf structure and moisture content [44,46,47]
$D W S I 3 = \frac{B 7 - G R E E N}{S W I R 1 + R E D}$	Disease water stress index, which is the similar as DWSI2
$D W S I 4 = \frac{B 8 A - G R E E N}{S W I R 1 + R E D}$	Disease water stress index, which is the similar as DWSI2

Table 3. Potato distribution identification model performance using the K-Means algorithm with the time-series data from Sentinel-1/2.

Evaluation Indicators	Value
Accuracy	0.90
Precision	0.90
Recall	1.00
F1 Score	0.95

Table 4. Performance of seven PLB surveillance models.

Models	Training RMSE	Validation RMSE	Training R²	Validation R²
CART	25.59	36.42	0.46	0.12
GTB	39.24	42.05	0.63	0.52
RF	16.41	23.40	0.84	0.62
STS–RF	26.23	30.54	0.50	0.10
TS–RF	15.40	27.84	0.81	0.54
MSTS–GTB	16.53	30.50	0.79	0.45
MSTS–RF	14.69	20.50	0.89	0.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chi, Z.; Chen, H.; Chang, S.; Li, Z.-L.; Ma, L.; Hu, T.; Xu, K.; Zhao, Z. Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine. Remote Sens. 2025, 17, 978. https://doi.org/10.3390/rs17060978

AMA Style

Chi Z, Chen H, Chang S, Li Z-L, Ma L, Hu T, Xu K, Zhao Z. Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine. Remote Sensing. 2025; 17(6):978. https://doi.org/10.3390/rs17060978

Chicago/Turabian Style

Chi, Zelong, Hong Chen, Sheng Chang, Zhao-Liang Li, Lingling Ma, Tongle Hu, Kaipeng Xu, and Zhenjie Zhao. 2025. "Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine" Remote Sensing 17, no. 6: 978. https://doi.org/10.3390/rs17060978

APA Style

Chi, Z., Chen, H., Chang, S., Li, Z.-L., Ma, L., Hu, T., Xu, K., & Zhao, Z. (2025). Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine. Remote Sensing, 17(6), 978. https://doi.org/10.3390/rs17060978

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Large-Scale Monitoring of Potatoes Late Blight Using Multi-Source Time-Series Data and Google Earth Engine

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Collection and PLB Severity Assessment

2.3. Multi-Resource Remote Sensing Data

2.4. Research Methods

2.4.1. Construction of Features and Indices

2.4.2. A Potato Identification Model Based on Multi-Source Time-Series Remote Sensing Features and Clustering Algorithms

2.4.3. Euclidean Distance-Based Determination of Separability of PLB Characteristics

2.4.4. A PLB Identification and Spatial Visualization Model Based on ML and Time-Series Remote Sensing Features

2.4.5. Performance Metrics

3. Results

3.1. Spatial Clustering Analysis Based on K-Means Algorithm and Sentinel 1/2 Time-Series Data

3.2. Potato Identification Model Based on Spatial Clustering and Multi-Indices

3.3. PLB Separability Results Based on Euclidean Distance

3.4. The Characteristics and Importance of the PLB Surveillance Model

3.5. Performance of PLB Surveillance Model

3.6. The Distribution of PLB Monitoring Results

4. Discussion

4.1. Sensitivity Features Analysis for PLB

4.2. The PLB Monitoring Model Based on Multidimensional Data

4.3. The Model Performance for Early PLB Monitoring

4.4. Limitations of the Model and Directions for Improvement

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI