Garlic Crops’ Mapping and Change Analysis in the Erhai Lake Basin Based on Google Earth Engine

: Garlic ( Allium sativum ) is an important economic crop in China. In terms of using remote sensing technology to identify it, there is still room for improvement, and the high-precision classification of garlic has become an important issue. However, to the best of our knowledge, few studies have focused on garlic area mapping. Here, we propose a method for identifying garlic crops using samples and a multi-feature dataset under limited conditions. The results indicate the following:


Introduction
Garlic (Allium sativum), as a globally significant economic crop and vegetable, is widely cultivated worldwide.With the growth of the global population and changes in dietary patterns, the cultivation area and production of garlic have been continually increasing.In the cultivation process of this crop, there is a significant demand for fertilizers and pesticides.With the global emphasis on ecological conservation and sustainable development in recent years, there has been a call for remote sensing identification to facilitate industry adjustments and precise management in garlic cultivation.However, the above-ground part of garlic is similar to other vegetation, making it difficult to extract its information directly from optical data and identify it with high precision.
Currently, remote sensing technology has been extensively utilized in the identification of crop growth, the monitoring of the growth status, and other processes related to crop development.Based on a series of remote sensing images, such as MODIS, Landsat, GF, Sentinel, etc. images, it is possible to support agricultural development and crop identification efficiently and intelligently [1,2].Scholars collaborate using multi-source, high-resolution data, exploiting the abundant temporal information for crop classification [3].Zhao et al. [4] utilized Landsat imagery combined with multitemporal data to create 30 m spatial resolution bamboo distribution maps for Uganda, Ethiopia, and Kenya.They proposed a composite hybrid evolution algorithm and a temporal similarity threshold to identify winter wheat, achieving an overall accuracy of 99% [5].Additionally, researchers achieved the high-precision classification of rice by combining the phenological features into a time series curve [6].Both pixel-based and object-oriented classification methods are commonly employed to enhance classification accuracy.For instance, Chen [7] developed a POK-based method integrating pixel-and object-oriented approaches, yielding favorable results.Wessel [8] successfully classified deciduous trees, oak trees, and others using both pixel-based and object-oriented methods.Mathieu [9] verified the high accuracy of object-oriented classification methods in mapping multiple tree species.However, these methods rely on local computer analysis, which leads to issues such as low efficiency, long processing times, and uncertain identification accuracy.As the volume of data increases, traditional computing models struggle to handle large-scale, high-resolution storage, leading to issues such as lag, data loss, etc.The emergence of remote sensing cloud computing platforms has successfully addressed these problems, enabling the processing and analysis of large-scale, extensive calculations.Currently, the most mature remote sensing cloud computing platform is Google Earth Engine (GEE), widely utilized both domestically and internationally [10].Apart from classifying and extracting information on major crops such as rice, wheat, and maize, remote sensing can also be used for the identification of other crops like palm trees [11] and tea plantations [12], significantly enhancing the classification effectiveness and accuracy while further refining the remote sensing detection system for crop cultivation.Therefore, the garlic crop remote sensing extraction models supported by the GEE platform are crucial for achieving high-precision planting monitoring.
In this study, we classified garlic crops in the Erhai Lake Basin by utilizing Landsat images and constructing an optimal multidimensional feature set suitable for extraction through Google Earth Engine (GEE).The objectives of this study address the following questions: (1) Is the utilization of the KNDVI effective for garlic crop identification?(2) Are the feature dataset and random forest classification effective for the biodiversity of Yunnan's cultivated crops?(3) Can we explore the spatiotemporal distribution of garlic crops in the Erhai Lake Basin and achieve satisfactory accuracy?

Study Area
Erhai Lake is the seventh-largest freshwater lake in China, situated on the Yunnan Plateau in the southwestern part of the country.It belongs to the southern end of the Hengduan Mountains, spanning from approximately 100 • 05 ′ to 100 • 17 ′ east longitude and from 25 • 36 ′ to 25 • 58 ′ north latitude.The total area of the lake is 2565 km 2 .The Erhai Lake Basin has a subtropical plateau monsoon climate, characterized by mild temperatures and a distinct seasonality that resembles spring throughout the year.The annual average temperature is 15.5 • C, and the average annual precipitation is 1000 mm.The Erhai Lake Basin, as an important garlic-producing area in Yunnan Province, has a topographical trend of high in the west and low in the east, which undoubtedly has a profound impact on the garlic planting methods, irrigation, management, and other aspects.Based on the investigation, it was found that garlic cultivation occurs in low-altitude areas of the Erhai Lake Basin, represented by the light green section in the map.At the same time, different types of land covers, such as arable land, forest land, grassland, construction land, and water areas, also have direct or indirect impacts on the growth environments and yields of garlic, as shown in Figure 1.

Methods
Garlic in the Erhai Lake Basin, as a geographical indication product, is one of the main sources of income for local farmers.Mapping the distribution of garlic can assist the government, farmers, and other stakeholders in better understanding the planting status and distribution of garlic, which, in turn, can facilitate the development of more effective agricultural policies, management measures, and market strategies.Initially, the image data are synthesized with the minimum cloud coverage, cropped, and resampled to the same resolution.Subsequently, terrain features and texture features are extracted by combining DEM data with the gray-level co-occurrence matrix algorithm.Finally, band synthesis is conducted to form a new remote sensing image.
This study analyzes the importance of the spectral, texture, and terrain features in crop identification.The multidimensional features were determined, the optimal features were selected, and the random forest algorithm was used to classify crops from 1999 to 2023.Then, the classification accuracy was evaluated using verification samples and statistical data, and the spatiotemporal changes in the garlic crops were analyzed, as shown in Figure 2.

Image Data
This study was based on the Landsat 5 and Landsat 8 satellite image datasets provided by GEE.Image collections were created based on the planting and maturity time of the garlic, selecting images for the periods of 1999, 2005, 2010, 2014, 2018, and from January to February 2023.First, cloud and shadow pixels were masked using the Quality Assessment (QA) band.Cloud and shadow bits were identified by defining specific bit masks (cloud-ShadowBitMask and cloudsBitMask), and then these masks were applied to the QA band through bitwise operations to recognize and mask cloud and shadow pixels.Atmospheric correction and radiometric calibration were applied to the data.Images with a cloud coverage of no more than 30% were selected, followed by cropping and cloud removal operations.These steps aim to provide high-quality and accurate surface reflectance and radiance information for garlic identification.The dataset consists of a total of 11 spectral bands.Bands B1-B9 were provided by the OLI sensor with a resolution of 30 m, whereas the panchromatic band (Band 8) has a resolution of 15 m and a swath width of 185 km.Bands B10 and B11 were provided by the TIRS sensor with a resolution of 100 m.It is worth noting that the information on the spectral bands is valid only for Landsat 8 and not for Landsat 5. To enhance the accuracy and coherence of the data, the image resolution is resampled to 30 m by defining the resampling function "var resampleImage = function(image) {. ..}".

DEM Data
The Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) is a DEM dataset jointly measured by the National Aeronautics and Space Administration (NASA) and the National Geospatial-Intelligence Agency (NGA), with a spatial resolution of 30 m.It is used to generate terrain parameters, including the elevation, slope, aspect, hill shade, elevation profile, and others.

Sample Data
Field surveys were conducted on the main garlic land cover types during the ripening period from January to February in the Erhai Lake Basin.Through visual interpretation and on-site investigation, areas with similar regional colors and texture features were marked as garlic and other crops.The annual sample point numbers and their classifications are shown in Table 1.The sample collection work was carried out using the GEE cloud platform.In the Erhai Lake Basin, characterized by Cangshan Mountain and Erhai Lake, the land cover types can be divided into seven categories, taking 2018 as an example: construction land; garlic cultivation areas; greenhouses; non-garlic areas; water; forests; and grasslands.Among them, the built-up areas include 110 samples of houses, roads, factories, and mines.
The non-garlic areas encompass 110 samples of cultivated land, succulent planting, flower planting, etc., excluding garlic and greenhouses.Additionally, there are 310 garlic samples, 100 waterbody samples, 145 forest samples, 100 greenhouse samples, and 45 grassland samples, totaling 920 sample points.To ensure an adequate number of validation samples for assessing the model's performance and addressing overfitting issues, the training and validation ratio was set at 8:2.Splitting the dataset into training and testing sets and repeating the process multiple times for evaluation enable the assessment of model performance and consistency verification.The study area's vegetation cover types, terrain characteristics, and vegetation maturity period guided the selection of features for garlic identification.Using Landsat 8 imagery, the calculations yielded 40 features comprising spectral indices, terrain characteristics, and texture features.These features, previously utilized in the land-use classification, were chosen based on their relevance to garlic identification [13].The details of these features are provided in Table 2, along with the original spectral features (B1-B11).These features include the Normalized Difference Vegetation Index (NDVI), Normalized Water Index (NDWI), Normalized Built-up Index (NDBI), Bare Soil Index (BSI), Enhanced Vegetation Index (EVI), and Spectral Ratio (SR).Different from sensors such as Sentinel, the dataset from Landsat 8 does not include the red-edge band and related vegetation indices mentioned by YOU et al. [14] during feature selection.When extracting texture features from the images, we used the gray-level co-occurrence matrix to compute the following 16 features: the entropy (ENT); inverse difference moment (IDM); angular second moment (ASM); variance (VAR); contrast (CONTRAST); correlation (CORR); dissimilarity (DISS); sum average (SAVG); shade (SHADE); difference variance (DVAR); profile (PROM); inertia (INTERTIA); sum variance (SVAR); spectral entropy (SENT); direction entropy (DENT); and maximum correlation (MAXCORR).To prevent overfitting and computational redundancy, only three terrain features were selected: slope (Slope), aspect (Aspect), and hill shade (Hill Shade).

Gray-Level Co-Occurrence Matrix (GLCM) Algorithm
The gray-level co-occurrence matrix (GLCM) is a statistical tool that is used to describe the texture features of digital images.The GLCM finds wide application in various fields, such as image processing, computer vision, and remote sensing image analysis.The graylevel co-occurrence matrix is based on the spatial relationships among the grayscale values in an image, capturing the statistical relationships between the pixel grayscale values in the image texture.In this study, the "glcm Texture()" function was utilized in GEE to calculate the texture features.The parameter "size" for the co-occurrence matrix's neighborhood size was set to 1, and the "kernel" for calculating the offset of the center pixel was set to the default neighborhood kernel.Afterward, through the "gray.unitScale(0,0.30)" operation, the pixel values of the grayscale image were normalized, bringing them within the range of 0-0.30.Following this, the "multiply" operation was applied to multiply the pixel values by 100, scaling the values to within the range of 0-30.Finally, the "toInt()" operation was used to convert the pixel values to the integer type.
In the calculation of the grayscale image, the original color composite image was created by linearly combining the red (R), green (G), and near-infrared (NIR) bands of the composite image with specific weights: 0.3, 0.59, and 0.11, respectively [15].This linear combination is commonly used for extracting texture features after converting a color image to a grayscale image.The formula is as follows: where INR is the near-infrared light, R is the infrared light, and G is the green light.

Random Forest Algorithm and Feature Selection
Leo Breiman introduced the random forest algorithm in his 2001 paper, "Random Forests" [16].The random forest algorithm is an ensemble learning algorithm that consists of multiple decision trees.Multiple decision trees are created by performing random, withreplacement sampling on the training data (bootstrap sampling).Additionally, random feature selection is applied to each decision tree, enhancing the model's diversity and generalization capability.Randomly selecting a subset of features at each node of every decision tree ensures that each tree is distinct, which thereby enhances the diversity of the random forest.This prevents certain features from dominating the model's predictions.In this study, we applied the random forest algorithm to classify Landsat 5 and 8 images.In GEE, the advanced random forest classifiers can be constructed using the "ee.Classifier.randomForest()"and "ee.Classifier.smileRandomForest()"functions.These functions train and predict models by configuring the hyperparameters such as the number of decision trees, the method of feature selection, the maximum depth of the decision trees, and other relevant parameters.The selected number of decision trees for classification is 1000.The RMSE plot for decision trees is provided in Supplementary Figure S2.In this study, in addition to determining the number of decision trees, it was necessary to set five parameters for each branch: the number of variables per branch; the minimum leaf size; the input fraction per tree; the maximum number of leaf nodes; and the seed number.The number of variables per branch was set to have no limit on the number of variables for the sub-tree.The minimum leaf size represents the number of terminal nodes, which was set to 1 in this case, without limiting the number of leaf nodes for the decision trees.The input fraction per tree represents the proportion of the input to the bag for each tree, which was set to 0.5 in this case.The maximum number of leaf nodes was set to unlimited.The seed number represents the seeds used in the random number generator, a pseudorandom number, set to the default value ("Default").Relevant studies have found that the classification performance may deteriorate after adding a certain number of feature variables [17,18].To address issues such as overfitting due to excessive variables and poor classification performance caused by computational complexity, the random forest algorithm automatically leverages out-of-bag (OOB) data.The algorithm utilizes internal functions to perform importance ranking and selects the top-ranked features for classification, thereby achieving the optimal classification performance.

Accuracy Assessment
In GEE, the sample points are integrated into a test set named "Test" to compute the confusion matrix of the classifier.Subsequently, relevant metrics related to the classification performance are outputted.The confusion matrix is employed to assess the performance of the classifier, illustrating the correct and incorrect classifications on the test set to validate the classification accuracy.To evaluate the performance of the classifier, various evaluation metrics such as the consumer accuracy (CA), producer accuracy (PA), overall accuracy (OA), and Kappa coefficient are computed.The consumer accuracy (CA) represents the proportion of correctly classified samples by the classifier among all true samples, the producer accuracy (PA) represents the proportion of samples that actually belong to a certain class among all the samples that the classifier predicts to be of that class, and the overall accuracy (OA) represents the proportion of correctly classified samples over the entire test set.The Kappa coefficient, a crucial metric for assessing the overall performance of the classifier, is a measure of the consistency between the classifier and random classification.It accurately assesses the performance of the classifier in handling class imbalances and random predictions.

Feature Selection Analysis
Based on the remote sensing imagery of the Erhai Lake Basin in 2018, this study selected 40 feature variables.The random forest algorithm was then applied to rank the importance of each of these feature variables, and the results are presented in Figure 3. Figure 3 visually indicates that the importance of each feature variable is concentrated between 0% and 14%.Ordinary spectral features and raw spectral features are among the most important for land-use classification.
Among the texture features, the gray_savg band has the highest importance, reaching up to 11.81%.In contrast, the second-order moments of angles (gray_amxcorr) do not play a role in land-use classification.The BSI (Bare Soil Index) contributes the most to the ordinary spectral indices, reaching up to 13.75%.Among the terrain features, the aspect contributes the most, reaching up to 11.69%.Among the texture features, gray_maxcorr, gray_sent, gray_dent, gray_ent, and gray_asm have the least impact on the classification.Out of the 40 feature variables, 16 features have importance rankings of 10% or higher in the classification.Of course, like Kolluru V and others, we can also demonstrate how each variable helps predict the garlic distribution by plotting response curves [19].Please refer to Supplementary Figure S1 for the variable importance for other years.
According to the relationship between the number of classification features and the classification accuracy, as shown in Figure 4, with the increase in the number of features, the classification accuracy initially rose and then decreased, which was followed by another increase before it gradually leveled off.The stability of the classification accuracy exhibited fluctuations with the number of features in the range of 10-30.As the number of features increased from 5 to 10, the classification accuracy increased from 0.910 to 0.950.However, after the number of features reached 35, the classification accuracy did not show a consistent increase but fluctuated with the increasing number of features.When the number of features reached 35, the classification accuracy peaked at 0.959.As the number of classification features exceeded 45, the accuracy gradually leveled off and stabilized at 0.958.Considering that the increase in the number of features could reduce computational efficiency, the top 35 features of importance were used.This included 11 original spectral features (B11; B8; B4; B10; B5; B1; B3; B9; B6; B2; B7), 10 spectral index features (BSI; SR; gNDVI; BAI; NDBI; NDVI; NDWI; kNDVI; EVI; Clg), 11 texture features (gray_savg; gray_shade; gray_diss; gray_dvar; gray_var; gray_prom; gray_corr; gray_intertia; gray_svar; gray_idm; gray_contrast), and 3 terrain features (aspect; hillshade; slope).

Accuracy Analysis
The confusion matrix, based on the 2018 classified data with feature selection, is presented in Table 3.The overall accuracy is 95.79%, and the Kappa coefficient is 0.95.The user accuracy for each land-class classification is consistently above 90%.The producer accuracy, garlic, waterbodies, built-up areas, forests, greenhouses, and grasslands exhibit accuracies exceeding 90%.However, the producer accuracy for the non-garlic land class is relatively lower at 89.25%.The lower accuracy for the non-garlic land class is mainly attributed to the inclusion of cultivated land other than garlic and greenhouses, such as areas with succulent plants, flower cultivation areas, etc.During the collection of the sample points, the accurate classification of these specific land uses might not have been conducted.The spectral similarity reflected in the remote sensing imagery leads to mutual confusion, resulting in a comparatively lower accuracy for this category.The land classes that exhibit better classification results are mainly garlic, waterbodies, and forests.Specifically, the mapping accuracy and user accuracy for garlic are 99.16% and 96.71%, respectively, meeting high classification standards.Over the past five years, both the overall accuracy and Kappa coefficient have consistently remained above 90%, demonstrating a stable and satisfactory classification level.This indicates good model performance and effective training, as depicted in Figure 5. Note: A total of 235 "Garlic" samples were correctly classified as "Garlic", 4 "Construction land" samples were misclassified as "Garlic", 99.16% of the samples that were actually garlic were correctly classified, and 96.71% of the samples predicted as "Garlic" by the model were indeed garlic.

Classification Analysis
Following the above steps, conducting feature selection analysis, and sequentially processing remote sensing imagery data from 1999, 2005, 2010, 2014, 2018, and 2023, the garlic planting distribution in the Erhai Lake Basin over the past 20 years was obtained, and it is illustrated in Figure 6.From the figure, it can be observed that from 1999 to 2005, the main garlic planting areas were upstream of the Erhai Lake Basin and in the western region.By 2010, with the decline in garlic prices, the planting area significantly decreased and was mainly concentrated in the western and northwest areas of the Erhai Lake Basin.By 2014, influenced by policies, the garlic planting area had shifted towards the northwest regions of the Erhai Lake Basin.This trend continued until 2018, forming a minor cultivation area in the western region.The primary concentration of garlic cultivation was observed in the northern part of the Erhai Lake Basin across five townships.By 2023, there was no longer any garlic cultivation within the Erhai Lake Basin.Based on the image recognition, the garlic cultivation area in 2023 was nearly zero.
To better illustrate the garlic cultivation areas in various townships within the Erhai Lake Basin, a classification map based on remote sensing image recognition was generated to show the statistical distribution of the garlic cultivation areas.The garlic cultivation area classification maps are presented in Figure 7. Considering the development history of garlic cultivation in the Erhai Lake Basin, in 1999, garlic planting was primarily concentrated in the northern and western parts of the basin, encompassing several townships.By 2005, the garlic cultivation area had gradually expanded.Townships with garlic cultivation areas exceeding 8 km 2 accounted for three-eighths of the total number of townships in the Erhai Lake Basin.By 2010, inflation led to a decline in garlic prices, resulting in decreases in the garlic cultivation areas across various townships in the Erhai Lake Basin, with none exceeding 6 km 2 .In 2014, garlic cultivation gradually rebounded, and the planting distribution gradually shifted towards the northern part of the Erhai Lake Basin, with an increasing planting area.By 2018, garlic cultivation was predominantly concentrated in the northern part of the Erhai Lake Basin.Looking at the overall picture, garlic cultivation in the Erhai Lake Basin began in the western region and then spread towards the upstream areas of the Erhai Lake Basin.Comparatively, the eastern part of the Erhai Lake Basin had the smallest garlic cultivation area.From a geographical perspective, garlic is a crop that consumes significant amounts of water and fertilizer, and it is primarily cultivated in the areas surrounding Erhai Lake, where water resources are abundant.
The center of gravity analysis method [20] and standard deviation ellipse theory [21] were employed to calculate the center of gravity and standard deviation ellipse of garlic cultivation in the Erhai Lake Basin from 1999 to 2018 (see Figure 8 and Table 4).According to the center of gravity analysis, from 1999 to 2010, garlic cultivation in the Erhai Lake Basin expanded towards the southeast.From 2010 to 2014, the center of gravity shifted towards the northeast.Between 2014 and 2018, the direction of garlic cultivation's center of gravity was southwest.The eastward spread of garlic cultivation's center of gravity in the Erhai Lake Basin slowed down from 2010 to 2018, and a change in direction occurred in 2014.In the standard deviation ellipse theory, the major axis represents the directional distribution, the minor axis represents the distribution range, and the major-to-minor-axis ratio indicates the directionality of the expansion.A ratio close to 1 suggests no clear directionality.During the period from 1999 to 2018, the ratio of the major-to-minor axis consistently exceeded 2, indicating a pronounced directionality.From 1999 to 2010, the ratio of the major-to-minor axis decreased from 4.3 to 4 and then increased to 4.2.This indicates that the directional expansion of garlic cultivation strengthened initially and then weakened during this period.By 2018, the major-to-minor-axis ratio further decreased to 3.64, indicating a continued weakening of the directional expansion.Furthermore, it was observed that the minor axis of the standard deviation ellipse elongated during the period from 2005 to 2018, indicating an increase in the distribution range of garlic cultivation.Note: CenterX: the coordinate of the center of the ellipse on the X-axis; CenterY: the coordinate of the center of the ellipse on the Y-axis; XStdDist: the standard deviation along the X-axis, indicating the spread of data in the X direction.;YStdDist: the standard deviation along the Y-axis, indicating the spread of data in the Y direction; Rotation: the rotation angle of the ellipse, representing the degree of rotation relative to the original coordinate axis; and XSid/YStd: the ratio of the standard deviation along the X-axis to the standard deviation along the Y-axis, describing the shape of the ellipse.

Discussion
In agricultural remote sensing research, most studies primarily focus on the identification of cereal crops, while there is relatively less research domestically and internationally on the remote sensing identification of economic crops such as tobacco, rubber, tea, and garlic.Currently, the research on garlic extraction primarily combines phenological periods with machine learning algorithms.For example, Wu Shuang and others obtained Sentinel-2 remote sensing images covering the entire growth cycle of garlic.They made progress in garlic identification by utilizing different combinations of multiple temporal phases [22].Additionally, some studies used convolutional neural networks to create garlic land classification models based on the growth stages.Through the use of high-resolution images and deep learning, they were able to detect the garlic yield throughout the entire growth stage [23].
In terms of classification methods, Indonesian scholars chose the k-nearest neighbor and maximum likelihood classification methods and compared them with pixel-based and image-based garlic classification results from previous studies, finding that the k-nearest neighbor classification method yields better classification results compared to support vector machine and maximum likelihood classification [24].Based on the random forest algorithm and the object-oriented approach, Ma Zhanlin and colleagues added index features and utilized simple non-iterative clustering (SINC) to select the optimal segmentation scale for garlic extraction.The overall accuracy and Kappa coefficient reached 94.54% and 0.93, respectively.This achievement is consistent with the good classification results by Tian Haifeng and others in identifying garlic and winter wheat using active and passive remote sensing [25].However, the research on garlic is mainly concentrated in northeastern China, such as in Shandong, and there is almost no research on the identification of garlic in Yunnan.This study utilized Landsat satellite imagery on the GEE platform for garlic identification in the Erhai Lake Basin.This approach significantly reduces data acquisition and preprocessing efforts.The classification performance was improved compared to those of previous studies.The overall accuracy was improved by approximately 1.3%, and the Kappa coefficient increased by around 2%.In addition to supplementing the literature references related to garlic in the Erhai Lake Basin, this study validates the applicability of feature selection combined with a random forest classification model based on the GEE platform for garlic determination.
In most articles related to feature selection, spectral features and vegetation index features play a dominant role.Spectral indices such as B8 and B11 hold higher positions in the feature importance ranking, followed by texture features, and lastly, red-edge spectral indices [26][27][28].There are fewer articles in the literature that simultaneously incorporate texture features and terrain features in feature selection studies using Landsat imagery.However, related studies indicate that texture features, along with terrain features, play an important role in land-use classification [29].The response to texture features becomes more pronounced as the land-use types become more complex [30].In this study, four terrain features exhibited high correlation, which could impact the classification results.Therefore, only a subset of terrain features was included in the analysis.Considering interference from noise and other factors, a combination of median and Gaussian filtering was employed for elimination.Additionally, the kNDVI, which is better at handling noise, enhancing saturation, and reducing "background effects" (such as soil, sparse vegetation, and water) [31], was added.This approach effectively addresses the saturation mixing pixel issue encountered by traditional indices.The kNDVI plays a role in improving the quantification and understanding of photosynthesis on a global scale going beyond the scope of vegetation monitoring, including applications in change and anomaly detection, phenology, and greening studies, among others.This study also found that the kNDVI exhibits stronger stability and robustness under various environmental conditions, such as dense forests, grasslands, and mixed forests, compared to the traditional NDVI and NIRv [32,33].In the classification conducted in this study, the kNDVI played a significant

Figure 1 .
Figure 1.Geographic location of study area.

Figure 3 .
Figure 3. Feature importance rankings as estimated by the permutation-based measure.Note: the remaining materials are provided in the Supplementary Materials of this article.

Figure 4 .
Figure 4. Relationship between feature dimension and accuracy.Note: The curve showing the change in classification accuracy as the number of features increases.

Figure 5 .
Figure 5. Overall accuracy and Kappa coefficient from 1999 to 2023.

Figure 8 .
Figure 8. Migration of garlic cultivation's center in Erhai Lake Basin and standard deviation ellipse.

Table 1 .
The number of sample points in the Erhai Lake Basin from 1999 to 2023.

Table 3 .
Confusion matrix for feature selection.

Table 4 .
Standard deviation ellipse parameters for garlic in the Erhai Lake Basin from 1999 to 2018.