1. Introduction
Silage maize, a critical feedstock for the livestock industry, has seen a steady increase in the cultivation area. Studying the spatial distribution of silage maize and grain maize can guide the livestock sector in adjusting planting strategies and optimizing regional agricultural layouts. However, to date, numerous studies have explored the classification of spectrally and phenologically similar crops using machine learning, deep learning, and threshold-based methods. Many of these studies have demonstrated that maize can be effectively distinguished from other crop types. However, relatively limited research has been conducted on the finer classification of different maize varieties. Meanwhile, similar growth characteristics of silage maize and grain maize, particularly their overlapping growth stages, pose significant challenges for remote sensing-based classification and mapping, unlike the distinct separability typically observed between different crop types. Furthermore, the narrow harvest window between the two maize varieties complicates field surveys to map their spatial patterns. Therefore, developing image processing techniques to rapidly and accurately differentiate silage maize from grain maize, while ensuring cross-year applicability in maize mapping, is of substantial importance for advancing precision livestock farming and sustainable agricultural practices.
Recent studies around the world have used machine learning (ML) and deep learning (DL) approaches to achieve precise identification based on multisource remote sensing and maize acreage monitoring. These methods typically rely on large-scale sample datasets to train models, enabling the generation of high-resolution national maize distribution maps. For example, X. Li et al. generated 10 m resolution maize cultivation maps for China from 2017 to 2021 using Sentinel-2 satellite imagery, combined with the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) [
1]. Their methodology leveraged both recurrent neural networks (RNNs) and random forest model, achieving overall classification accuracies ranging from 0.83 to 0.95 across the five-year data set. This work provides a vital reference for global food security research by offering spatially explicit insights into maize distribution patterns. Similarly, N. You et al. collected extensive ground-truth samples through field surveys and high-resolution image interpretation, covering major crops such as maize, soybean, and rice [
2]. By employing an RF algorithm with spectral, temporal, and textural features, they implemented a hierarchical classification approach to distinguish croplands and crop types. The resultant annual 10-m resolution crop maps for Northeast China (2017–2019) demonstrated high reliability, with overall accuracies ranging from 0.81 to 0.86 and a strong consistency against statistical yearbook records (R
2 = 0.83–0.99). B. Wang et al. developed the SSATNet, a deep learning architecture integrating 3D–2D convolutions, spectral–spatial morphological operations, and a Transformer encoder, based on a hyperspectral maize image dataset [
3]. The model extracts spectral–spatial correlation features via 3D convolutions, enhances local structures through dilation/erosion morphological operations, and captures global dependencies using a cross-attention Transformer mechanism. It achieved exceptional performance on the test set, with 0.9865 precision, 0.9857 recall, and a Kappa coefficient of 0.9965, offering a high-performance and interpretable solution for hyperspectral crop classification and advancing precise variety identification in smart agriculture. H. Li et al. addressed the limitations of insufficient reference and training data in traditional supervised classification by collecting extensive samples through field surveys, statistical yearbook inferences, and existing crop maps [
4]. They employed a random forest classifier with multispectral bands, vegetation indices, and time series data as input features, coupled with a regression estimator to integrate classification results with sample data for crop area estimation. By optimizing probability thresholds to align with sample area estimates, they successfully generated annual 10-m resolution maps of soybean and maize across China (2017–2021), significantly enhancing the accuracy of crop classification and area estimation. M. Gilcher et al. evaluated three classifiers—generalized linear model (GLM), random forest (RF), and support vector machine (SVM)—combined with the following two spatial autocorrelation mitigation methods: simple kriging (SK) and Gaussian blur [
5]. The SVM and RF classifiers reliably distinguished maize pixels, with kappa values consistently exceeding 0.9, while GLM exhibited markedly inferior performance. Their study demonstrated that regression kriging and Gaussian blur both improved classification performance and reduced spatial autocorrelation under uniformly distributed samples. However, Gaussian blur proved to be more robust for clustered samples due to its lower dependency on sample distribution.
In recent years, increasing attention has been given to the interpretability and generalization ability of ML/DL methods. For instance, Ji Ge et al. proposed a novel interpretable deep learning model—explainable Mamba UNet (XM-UNet)—for mapping global rice cultivation areas [
6]. By integrating selective scanning with convolutional neural networks, this model effectively balances the local and global spatial features of rice fields and provides interpretable temporal feature importance. This approach offers valuable guidance for rice distribution mapping and fills a notable gap in the existing research. Similarly, Yijia Xu et al. introduced a Transformer-based spectral–temporal network, STNet, which leverages the self-attention mechanism to extract informative features from time series remote sensing imagery for crop classification tasks [
7]. In addition, they proposed a self-supervised pretraining framework, SITS-MoCo, which utilizes unlabeled remote sensing data to pretrain the model. This enables the learning of robust feature representations, reduces dependence on labeled data, and supports large-scale crop classification applications.
The threshold-based method is relatively simple to use, is especially easy to quickly deploy and apply on remote sensing data cloud platforms such as GEE, and its mechanism is clear. For example, TAO Jian-bin et al. proposed the Phenology-based Winter Rapeseed Index (PWRI), which effectively distinguishes winter rapeseed from winter wheat by analyzing their unique spectral characteristics at different phenological stages [
8]. By integrating multi-temporal Sentinel-2 imagery on the Google Earth Engine platform, they employed a hierarchical classification method based on dynamic thresholds to identify winter rapeseed. This approach offers a novel solution for the large-scale, high spatial resolution mapping of winter rapeseed. Similarly, Gaoxiang Yang et al. developed a knowledge-guided machine learning method that integrates multi-source remote sensing and environmental data to extract annually updated training samples for the long-term mapping of winter wheat in China [
9]. The study first identified key phenological stages based on crop growth dynamics, using variations in spectral or polarization features to enhance the separability between crop types. Then, annual training samples were automatically extracted and optimized from candidate crop pixels, and harmonic features were used to train a machine learning classifier to produce year-by-year distribution maps of winter wheat across China.
To mitigate the limited interpretability of ML/DL models, researchers have adopted phenology-based approaches for maize identification and monitoring. For instance, L. Zhong et al. developed a phenology-guided random forest classifier using Landsat TM/ETM+ imagery and meteorological data from Doniphan County, Kansas (2006–2010) [
10]. By extracting phenological features from the Enhanced Vegetation Index (EVI) time series and integrating derived parameters like cumulative growing degree days (GDD), their model achieved the cross-year classification of maize and soybean, with average accuracies of 0.9010 (same-year training) and 0.8230 (cross-year validation). This framework provides a low-cost, high-frequency solution for agricultural remote sensing, particularly in scenarios with scarce ground truth data.
In another study, Zhong et al. extracted crop phenological stages by fitting EVI time series curves and utilized shortwave infrared (SWIR) band reflectance to differentiate soybean and maize [
11]. They designed a decision tree classifier with expert-informed decision rules that synergized phenological and spectral characteristics, generating annual soybean and maize maps for Paraná State, Brazil (2010–2015). This robust algorithm enables the cost-effective production of multi-year consistent cropland cover data, with a specialized focus on soybean and maize discrimination, demonstrating adaptability to heterogeneous agricultural landscapes. However, phenological indices exhibit significant variability across regions and years, particularly under climate change or extreme weather events, where shifts in crop growth cycles may degrade classification accuracy.
Furthermore, despite the rapid advancement of satellite remote sensing as a cornerstone for land surface monitoring, natural resource surveys, and disaster response, pervasive data gaps persist due to cloud cover, sensor malfunctions, atmospheric disturbances, and temporal sampling limitations [
12]. For instance, Li et al. evaluated the applicability of time series smoothing methods for grassland spring phenology (SOS) extraction on the Tibetan Plateau, emphasizing the impact of smoothing parameters on SOS accuracy [
13]. While default parameters yielded acceptable results for most methods, Whittaker smoothing required parameter optimization, highlighting that both smoothing algorithms and SOS extraction methods jointly determine phenological accuracy. Zhou et al. systematically compared five time series reconstruction methods using MOD09GA data (2001–2014) to assess their performance across diverse geographies and vegetation types [
14]. They synthesized pixel-wise NDVI reference sequences from raw daily NDVI data, simulated noise distributions based on QA flags, and demonstrated region-specific optimal methods, underscoring the lack of a universal solution. To address continuous data gaps, Cao et al. proposed the Spatio-Temporal Savitzky–Golay (STSG) method, which assigns weights to NDVI points by comparing raw values with initial estimates, then applies a weighted SG filter to enhance temporal continuity and accuracy [
15]. Similarly, B. Qiu et al. developed a Successive Correction Wavelet Transform (SCWT) for smoothing MODIS 8-day EVI2 composites in China (2013) [
16]. Compared to five conventional methods, SCWT excelled in handling complex multi-growth-cycle crop signals, preserving authentic local extrema while minimizing false inflection points, as validated by metrics including fidelity, smoothness, and phenological extraction efficiency.
In the context of silage maize-specific research, Hamid Salehi Shahrabi et al. introduced a novel parameter, SPGH/LOS (Spectral-Phenological Growth Height/Length of Season), and developed an automated silage maize detection framework using Sentinel-2 time series data [
17]. This method leverages crop phenological rules to classify maize without requiring training data. Validated across four regions (Abyek, Marvdasht, Mashhad, and Tulare) in 2017, the framework achieved kappa coefficients of 0.89, 0.80, 0.90, and 0.80, respectively, with overall accuracies consistently exceeding 0.90, demonstrating robust generalizability for label-free crop mapping.
While existing studies have achieved substantial progress in multi-source remote sensing-based precise identification and acreage monitoring of maize, research focusing specifically on silage maize recognition and area quantification remains limited. To address this gap, this study leverages the GEE platform, utilizing preprocessed Sentinel-2 imagery from Hohhot City (2023) as the foundational dataset. Through the phenological analysis of silage maize and grain maize, we identified their critical harvest periods and applied the Whittaker smoothing filter to refine spectral–temporal profiles. This enabled the development of the Temporal Difference-based Silage Maize Identification Model (TempDiff-SMID), which exploits harvest timing disparities for classification. The TempDiff-SMID was rigorously benchmarked against the random forest-based Silage Maize Classification Model (SMRF), demonstrating superior accuracy. Furthermore, by optimizing temporal thresholds, the model exhibited robust cross-temporal applicability in multi-year silage maize mapping. This framework provides a scalable, interpretable solution for fine-scale differentiation and area monitoring of silage and grain maize, addressing critical needs in agricultural resource management.
2. Study Area and Data Sources
2.1. Study Area
The study area encompasses Hohhot City, Inner Mongolia Autonomous Region, located between 40°51′–41°8′ N and 110°46′–112°10′ E, with a total area of 17,200 km2. Hohhot features a mid-temperate continental monsoon climate, characterized by synchronized rainfall and heat during the growing season, providing optimal thermal conditions for maize growth across all developmental stages.
Maize cultivation in Hohhot is concentrated in the Tumochuan Plain and the southern foothills of the Yin Mountains, which are regions endowed with abundant hydrothermal resources, sufficient sunlight, and fertile soils. According to 2024 statistical data, the total maize planting area in Hohhot reached 239,642 hectares, accounting for 0.711 of the city’s total grain-crop acreage. Other cereals account for only 39,000 hectares, and legumes cover just 16,700 hectares, indicating that Hohhot is predominantly a major production area for maize. Notably, the silage maize cultivation area has increased annually, reaching 81,733 hectares in 2024. The local cropping system follows a single annual harvest cycle, with maize typically sown from late April to early May and harvested between September and early October.
The spatial distribution of the study area is illustrated in
Figure 1.
2.2. Data Sources
The research data include Sentinel-2 imagery, DEM data, statistical yearbook data, phenological data, and UAV data.
Sentinel-2 is an Earth observation satellite mission under the Copernicus Programme led by the European Space Agency (ESA). It carries 13 multispectral bands and focuses on high-resolution multispectral imaging for land, vegetation, and water monitoring. Its high spectral resolution enables the capture of richer spectral information, facilitating more accurate identification and classification of land cover types. Sentinel-2 data are widely used in agriculture, forestry, environmental monitoring, and disaster management [
18]. The Sentinel-2 constellation comprises Sentinel-2A and Sentinel-2B, which together reduce the revisit cycle to five days. The COPERNICUS/S2-HARMONIZED dataset on the GEE platform is a preprocessed version of the raw data that incorporates geometric correction, radiometric calibration, cloud masking, band alignment, and temporal normalization.
NDVI (Normalized Difference Vegetation Index), proposed by Rousea et al. in 1974, is based on the spectral reflectance characteristics of vegetation in the red (R) and near-infrared (NIR) bands. Vegetation exhibits strong absorption in the red band and high reflectance in the near-infrared band. NDVI leverages this distinct spectral behavior to quantify vegetation coverage and health status [
19,
20], as follows:
In the equation, denotes the reflectance in the NIR band, and denotes the reflectance in the R band.
LSWI (Land Surface Water Index), developed based on the spectral reflectance characteristics of vegetation in the shortwave infrared (SWIR) and near-infrared (NIR) bands, leverages this distinct spectral behavior to assess surface moisture conditions. Vegetation and water bodies exhibit lower reflectance in the SWIR band and higher reflectance in the NIR band. By quantifying this contrast, LSWI effectively captures variations in land surface water content, including vegetation canopy moisture and soil wetness [
21], shown as follows:
In the equation, denotes the reflectance in the NIR band, and denotes the reflectance in the SWIR band.
A Digital Elevation Model (DEM) is a digital representation of terrain elevation that provides explicit spatial information on topographic relief, slope gradient, and aspect. DEM data are typically stored in raster format with spatial resolutions ranging from 30 m to sub-meter precision, supporting multi-scale research and applications [
22]. Hohhot City is situated on the Inner Mongolia Plateau, which is characterized by a terrain dominated by plains and low hills, with generally gentle slopes but localized undulations. The northern region transitions into the Daqing Mountains, while the southern area comprises the Tumochuan Plain, creating significant topographic contrasts. The spatial distribution and elevation characteristics of the DEM data are illustrated in
Figure 1.
The statistical data were sourced from the Hohhot Statistical Yearbook to quantify and validate the planting areas of grain maize and silage maize in Hohhot. In 2024, the total maize cultivation area in Hohhot reached 3.8159 million mu (about 254,393 hectares), with silage maize accounting for 1.226 million mu (about 81,733 hectares).
Phenological data play a critical role in agricultural planning, ecological studies, and climate change analysis. In Hohhot, silage maize is typically sown in early May, with a growth cycle of approximately 116 days, progressing through the following key developmental stages: emergence, jointing, tasseling, milking stage, and dough stage. The optimal harvest window for silage maize spans from the late milking stage to the early dough stage (mid-August to early September), when grains are not fully matured but the stalks and kernels exhibit peak nutritional value. Grain maize shares a similar sowing period (early May) but has a longer growth cycle of 120–130 days. Its harvest occurs during the full maturity stage, when grains are completely ripened, typically from late September to early October.
5. Discussion
This study leverages the GEE platform and multi-temporal Sentinel-2 imagery to develop a TempDiff-SMID for discriminating silage maize and grain maize, capitalizing on their divergent phenological characteristics during critical growth stages. By integrating NDVI and LSWI time series data with Whittaker smoothing, the proposed model establishes explicit decision rules that, by leveraging the phenological differences between silage maize and grain maize, allowed it to achieve high-accuracy classification (overall accuracy: 92.91%, Kappa coefficient: 0.8923), enabling the fine-grained extraction of maize varieties. At present, research on silage maize in China remains limited, and there are no existing silage maize classification products available for direct comparison with our results. Therefore, we compared our classification outcomes with the publicly available national maize classification results for 2023. The comparison indicates that our classification exhibits consistent accuracy and spatial detail with existing products. Detailed comparisons are presented in
Figure 8.
Unlike vanilla “black-box” approaches such as RF, the knowledge-driven framework provides hierarchical and explicit rules, whose classification logic directly maps to phenological transitions and spectral features. For instance, the model quantifies harvest timing differences through LSWI decline thresholds during the late milking stage, a feature intrinsically linked to silage maize’s earlier harvest. This transparency facilitates cross-year adaptability, enabling manual threshold adjustments to accommodate interannual phenological shifts (e.g., 2021–2022 applications achieved accuracies of 0.8816 and 0.8621, respectively). Compared to machine learning and deep learning methods, it reduces dependence on training samples and computational costs, making it well-suited for agricultural ap-plications characterized by limited data availability, high interpretability require-ments, and dynamic adaptation needs.
However, current threshold optimization relies on empirical adjustments, limiting scalability for large-scale multi-year monitoring. While the model enhances interpretability, it also imposes certain limitations on automation and cross-regional applicability, particularly in cases where phenological variations occur across years or regions. Future work could integrate machine learning-driven phenological phase detection or climate-informed dynamic parameterization to automate threshold tuning, thereby enhancing model generalizability while retaining interpretability. Such advancements could bridge the gap between rule-based and data-driven paradigms in agricultural remote sensing.
In addition, in terms of data sources, this study mainly relies on Sentinel-2 image data, combined with vegetation indices such as NDVI and LSWI to classify silage maize. Although Sentinel-2 data have advantages in spectral resolution and temporal resolution, a single data source may not be able to fully meet the classification needs in areas with complex terrain or heavy cloud coverage. In addition, issues such as spectral mixing and within-field heterogeneity at the parcel scale may further affect classification accuracy. Future research could consider the fusion of multi-source remote sensing data to enhance temporal resolution. For example, integrating optical imagery from Sentinel-2, Landsat-8/9, and MODIS with varying spatial and temporal resolutions could facilitate the construction of a temporally denser dataset, thereby improving the detection of key phenological windows for distinguishing between grain maize and silage maize. In regions characterized by frequent cloud cover and heavy rainfall, the incorporation of SAR data, such as Sentinel-1, in combination with optical imagery, may further enhance the robustness of silage maize identification under complex environmental conditions. Moreover, the remote sensing imagery used in this study has a spatial resolution of 10 m, which enables the distinction between silage maize and grain maize across relatively large spatial extents with reasonably high accuracy. If finer classification results at the field scale are desired, then the incorporation of higher-resolution UAV imagery could be considered in future work. However, this approach would significantly increase the cost of data acquisition. Therefore, the decision to employ UAV imagery must balance the trade-off between mapping accuracy and cost-effectiveness.
The results of this study provide technical support for the refined classification of silage maize and have broad application prospects. As an important source of feed in animal husbandry, accurate monitoring of the planting area of silage maize is of great significance for optimizing feed resource allocation and adjusting planting structure.
Building upon future advancements in automated phenological phase detection and dynamic threshold determination, this method can be scaled up to larger spatial extents, enabling the provincial- and even national-level mapping of silage maize. Such large-scale applications would provide critical foundational data to support food security, precision agriculture, and the sustainable development of the livestock industry.
With the advancement of deep learning technologies, future research could explore the integration of decision tree models with deep learning approaches. Specifically, time series models such as LSTM and Temporal CNN can be employed to automatically identify key change points in vegetation index time series (e.g., NDVI, LSWI), thereby enabling the construction of regionally adaptive models. These models can output optimal classification thresholds or critical phenological windows specific to different regions, addressing the limited generalization capability of traditional models across years and geographic areas. Such integration is expected to further enhance both the classification accuracy and the level of automation of the proposed approach.
6. Conclusions
This study developed a TempDiff-SMID on the GEE platform, utilizing multi-temporal Sentinel-2 imagery and leveraging the distinct phenological characteristics between silage maize and grain maize during critical growth stages. The model successfully achieved the high-precision classification of silage maize and grain maize in the study area, generating accurate 10 m resolution distribution maps of silage maize. The main conclusions are as follows:
This study systematically compared two machine learning methods leveraging phenological differences for silage maize classification, with both approaches demonstrating robust performance. The TempDiff-SMID achieved an overall accuracy of 0.9291 and Kappa coefficient of 0.8923, outperforming the SMRF (0.9043 accuracy, 0.8511 Kappa). Crucially, the TempDiff-SMID framework exhibited superior interpretability through its transparent classification rules that directly characterize spectral–phenological differences between silage and grain maize, particularly in capturing harvest timing variations through vegetation index trajectories. This inherent interpretability not only facilitates model diagnostics but also enables cross-year applications, as validated by consistent performance across multiple growing seasons (2021–2023). Furthermore, the TempDiff-SMID significantly reduces dependence on extensive labeled training data by effectively integrating multi-temporal spectral signatures, topographic features, and phenological characteristics. The model thus represents an optimal balance between classification accuracy and mechanistic understanding, addressing fundamental limitations of conventional “black-box” machine learning methods in agricultural remote sensing.
The TempDiff-SMID demonstrated robust performance in cross-year silage maize classification, achieving an overall accuracy of 0.8621 (Kappa coefficient: 0.7905) for 2022 and 0.8816 (Kappa: 0.8192) for 2021 applications. This temporal adaptability enables the reliable multi-year monitoring of silage maize cultivation through systematic threshold adjustments, facilitating the accurate detection of interannual planting area variations. The framework’s capacity to support long-term agricultural management holds significant implications for optimizing forage resource allocation and guiding regional cropping structure adjustments, particularly in dynamic farming systems.
The findings of this study provide robust technical support for fine-scale silage maize classification, demonstrating the critical influence of phenological differences in distinguishing silage maize from grain maize. This approach offers significant potential for practical applications, particularly in enabling agricultural authorities to optimize cropping systems and precisely manage forage resources through data-driven decision-making. The developed framework exhibits notable extensibility, with methodological adaptability to other crop classification scenarios. Furthermore, the threshold adjustment mechanism inherent in the decision tree model establishes a viable pathway for reliable cross-year applications, enhancing the model’s utility in long-term agricultural monitoring systems.