Next Article in Journal
Identifying the Nonlinear Impact Mechanisms of Urban Park Vitality: A Case Study of Changsha
Previous Article in Journal
Urban Morphology, Deep Learning, and Artificial Intelligence-Based Characterization of Urban Heritage with the Recognition of Urban Patterns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine Learning-Validated Comparison of LAI Estimation Methods for Urban–Agricultural Vegetation Using Multi-Temporal Sentinel-2 Imagery in Tashkent, Uzbekistan

by
Bunyod Mamadaliev
1,*,
Nikola Kranjčić
2,*,
Sarvar Khamidjonov
1 and
Nozimjon Teshaev
3
1
EcoGIS Center, “Tashkent Institute of Irrigation and Agricultural Mechanization Engineers” National Research University, 39 Kori Niyoziy Street, Tashkent 100000, Uzbekistan
2
Department of Geodesy and Geomatics, University North, 42000 Varaždin, Croatia
3
Department of Geodesy and Geoinformatics, “Tashkent Institute of Irrigation and Agricultural Mechanization Engineers” National Research University, 39 Kori Niyoziy Street, Tashkent 100000, Uzbekistan
*
Authors to whom correspondence should be addressed.
Land 2026, 15(2), 232; https://doi.org/10.3390/land15020232
Submission received: 26 December 2025 / Revised: 20 January 2026 / Accepted: 28 January 2026 / Published: 29 January 2026

Abstract

Accurate estimation of Leaf Area Index (LAI) is essential for monitoring vegetation structure and ecosystem services in urban and peri-urban environments, particularly in small, heterogeneous patches typical of semi-arid cities. This study presents a comparative assessment of four empirical LAI estimation methods—NDVI-based, NDVI-advanced, SAVI-based, and EVI-based methods—applied to atmospherically corrected Sentinel-2 Level-2A imagery (10 m spatial resolution) over a 0.045 km2 urban–agricultural polygon in the Tashkent region, Uzbekistan. Multi-temporal observations acquired during the 2023 growing season (June–August) were used to examine intra-seasonal vegetation dynamics. In the absence of field-measured LAI, a Random Forest regression model was implemented as an inter-method consistency analysis to assess agreement among index-derived LAI estimates rather than to perform external validation. Statistical comparisons revealed highly systematic and practically significant differences between methods, with the EVI-based approach producing the highest and most dynamically responsive LAI values (mean LAI = 1.453) and demonstrating greater robustness to soil background and atmospheric effects. Mean LAI increased by 66.7% from June to August, reflecting irrigation-driven crop phenology in the semi-arid study area. While the results indicate that EVI provides the most reliable relative LAI estimates for small urban–agricultural patches, the absence of ground-truth data and the influence of mixed pixels at 10 m resolution remain key limitations. This study offers a transferable methodological framework for comparative LAI assessment in data-scarce urban environments and provides a basis for future integration with field measurements, higher-resolution imagery, and LiDAR-based 3D vegetation models.

1. Introduction

Urban vegetation is a fundamental component of sustainable cities, directly contributing to critical ecosystem services such as climate regulation, air purification, and human psychological well-being [1,2]. Quantifying these benefits requires accurate measurement of vegetation structure, for which the Leaf Area Index (LAI)—defined as the total one-sided leaf area per unit ground area—serves as a key biophysical variable [3]. Reliable LAI data enables urban planners to optimize green space design, manage water resources, and mitigate urban heat islands effectively [4].
Traditionally, LAI is derived from satellite data using spectral vegetation indices (VIs). The Normalized Difference Vegetation Index (NDVI) is the most widely used, but it is known to be sensitive to confounding factors like soil brightness and atmospheric conditions, particularly in sparse vegetation [5]. To address these issues, adjusted indices such as the Soil-Adjusted Vegetation Index (SAVI) and the Enhanced Vegetation Index (EVI) were developed [6,7]. However, their relative performance is highly context-dependent, varying with vegetation type, density, and landscape heterogeneity. This context dependence becomes a significant challenge in urban environments, where vegetation exists in small, fragmented patches intermixed with impervious surfaces and bare soil [8,9]. In such settings, the spectral signal is complex, and standard VIs may perform inconsistently. This issue is especially pronounced in small urban–agricultural plots (often <0.05 km2), which are vital for local food production and ecology but are difficult to monitor with medium-resolution satellites due to mixed pixels and a lack of representative ground data for calibration [10,11]. Consequently, there is an ongoing debate and a notable research gap regarding the most robust method for estimating LAI in these small, heterogeneous urban–agricultural areas, particularly in understudied semi-arid regions [12]. Recent advancements in machine learning, such as Random Forest regression, offer powerful tools for modeling complex relationships and validating remote sensing products, even in the absence of extensive in situ measurements, by assessing internal consistency and method stability [13,14,15,16]. To address these challenges, this study conducts a comprehensive empirical comparison of four established LAI estimation methods—NDVI-based, SAVI-based, EVI-based, and an advanced NDVI approach—applied to high-resolution Sentinel-2 imagery. We focus on a small (0.045 km2) urban–agricultural polygon in Tashkent, Uzbekistan, analyzing temporal dynamics across a growing season. The principal aim is to identify the most accurate and reliable method for this critical yet challenging-to-monitor landscape.
Over the past two decades, a large amount of research has focused on the empirical application of optical remote sensing for leaf area index (LAI) retrieval via vegetation index-based methods in various environments. Theoretical and applied links have been made by pioneering studies to illustrate the potential and limitations of spectral vegetation indices in relation to the biophysical properties of the vegetation canopy [1,2,12]. Subsequent research has extended these methods to more challenging environments, such as urban and peri-urban areas, where the spatial heterogeneity of the landscape, the existence of isolated vegetation patches, and mixed pixels have made the retrieval of LAI difficult [17]. Recent studies have shown that soil and atmospheric effects could be reduced using soil-adjusted indices (SAVIs) and Enhanced Vegetation Index (EVI) in heterogeneous and semi-arid environments [7,18,19]. However, comparative studies specifically targeting very small urban–agricultural patches (area < 0.05 km2) are still lacking, particularly in the data-scarce environments of Central Asia, where in situ calibration data are often not available. This justifies the need for a methodologically focused comparison that privileges relative robustness over accuracy.
In the face of the extensive use of vegetation indices for leaf area index (LAI) retrieval, gaining a general consensus on the most reliable empirical method for small and varied urban–agricultural areas has proven difficult to achieve, particularly in the context of semi-arid zones characterized by soil exposure and irrigation patterns. Moreover, many comparison studies have tended to assume the validity of LAI values derived by various indices as ground truth or have employed a small amount of ground calibration data in practice. In an effort to fill the existing research gap, the current research performs an extensive comparison of the performance of four of the most common empirical LAI retrieval methods—NDVI-based, NDVI-advanced, SAVI-based, and EVI-based methods—applied uniformly on multi-temporal Sentinel-2 images. Rather than using ground-validation data for comparison purposes, the use of machine learning as an investigative technique is proposed for the internal assessment of the relative performance of each method under the same image preprocessing requirements. Through the use of a small (0.045 km2) polygon of the urban–agricultural area typical of cities in the semi-arid climate of Central Asia, the aim of the current research is to discern the most robust relative LAI estimation method for vegetation cover monitoring in the context of the inherent limitations of medium-scale remote sensing.

2. Materials and Methods

This study was conducted following an observational, remote sensing-based methodology. All data and code used in this analysis are publicly available, with details provided in the respective sections below. No ethical approval was required as the research did not involve human or animal subjects.

2.1. Study Area

The study focuses on a designated urban–agricultural polygon within the Tashkent region of Uzbekistan (Figure 1). Established for innovation and training under the Tashkent Institute of Irrigation and Agricultural Mechanization Engineers National Research University, the polygon is defined by the following coordinates: (1) 41.0377°N, 69.2500°E; (2) 41.0370°N, 69.2507°E; (3) 41.0342°N, 69.2467°E; and (4) 41.0348°N, 69.2459°E. Encompassing an area of 0.045 km2, the site is located 10–15 km southwest of Tashkent city in the fertile Chirchik River basin. The region experiences a continental semi-arid climate with approximately 300–400 mm of annual precipitation [14]. The landscape is characterized by a fine-grained mosaic of cultivated plots (e.g., cotton, wheat, and vegetables), fallow soil, and scattered low-density structures, representing a typical small, heterogeneous urban–agricultural interface in Central Asia, where irrigation is essential.
This fine-scale mosaic of cultivated plots, exposed soil, and sparse built structures is representative of transitional urban–agricultural landscapes that are increasingly important for urban food production, microclimate regulation, and green infrastructure planning in semi-arid cities.

2.2. Data Acquisition and Preprocessing

Satellite imagery was acquired from the European Space Agency’s (ESA) Sentinel-2 Multispectral Instrument (MSI). We utilized Level-2A (Bottom-of-Atmosphere reflectance) products with a spatial resolution of 10 m for three cloud-free (<15% cloud cover) dates during the 2023 growing season: 23 June, 13 July, and 17 August (Table 1). The Level-2A products, processed with the Sen2Cor atmospheric correction algorithm [15], were accessed via the Copernicus Open Access Hub. No further atmospheric correction was applied. The bands used for index calculation (Red: B4; Near-Infrared: B8; Blue: B2) were extracted, and the study area was clipped using the defined polygon boundary in a geographic information system (GIS) environment. The use of Sentinel-2 Level-2A data, produced following a harmonized atmospheric correction algorithm, helps to ensure that the data is radiometrically consistent regardless of the date of acquisition and reduces the effect of atmospheric variability on the surface reflectance values, a critical consideration in order to perform accurate inter-method and multi-date comparisons of vegetation indices in heterogeneous environments that include both urban and agricultural areas.

2.3. Vegetation Indices and LAI Estimation Methods

Four empirical methods for estimating the Leaf Area Index (LAI) were applied, each based on a different vegetation index (VI). The coefficients for the empirical formulas were adopted from established literature, as local in situ LAI measurements for calibration were unavailable—a common constraint in such under-monitored environments that this study aims to address. The empirical factors used within all Leaf Area Index (LAI) models were taken from previous studies and were not adjusted locally, as there were no local measurements available for the area of concern. This is a common issue within operational urban and per-urban remote sensing studies, especially within data-scarce areas; however, there are significant implications with regard to interpretation. In other words, the absolute values of LAI presented within this research may differ from the actual values within the respective canopies because of varying factors such as vegetation type, management, soil type, and climate from those within which the original models were derived. This issue has significant implications within empirical models for LAI prediction. However, since the same atmospherically corrected Sentinel-2 images were used for the application of all four LAI estimation approaches, the comparison framework retains internal consistency. Figure 2 presents full methodological framework workflow. Under such controlled conditions, the differences between the approaches mainly represent the relative robustness to background, atmospheric, index formula, and non-linear relationships with canopy density rather than the differences in the data used. Hence, the outcome retains integrity for the comparison of the relative performance and internal consistency between the approaches for the estimation of LAI, even if the absolute biological accuracy may not be guaranteed. As such, it is clear that the primary value of this research is based on methodological comparison, as it is not focused on deriving LAI values. Instead, it is recommended that this research be interpreted as providing insights into which the empirical method is most robust from a practical standpoint regarding relative LAI measurements. Future research should focus on calibrating such measurements on local LAI data, as well as incorporating more refined image data to improve absolute measurements.
The VIs were then converted to LAI using the following empirical relationships (Table 2).

2.4. Statistical Analysis

To compare the outputs of the four LAI estimation methods, a comprehensive statistical analysis was performed on the pixel-level data pooled across all three dates. A one-way Analysis of Variance (ANOVA) was conducted to test statistically significant differences between the mean LAI values estimated by the four methods, with a significance level of p < 0.05. Where ANOVA indicated significance, post hoc pairwise t-tests with Bonferroni correction were applied to identify which specific method pairs differed. Descriptive statistics (mean, standard deviation, median, and range) were computed for each method’s output to summarize central tendency and variability. Temporal trends across the growing season were analyzed by calculating the mean LAI for each method and date.
Since the analysis was carried out at the pixel level using medium-resolution Sentinel-2 imagery, individual data values are spatially autocorrelated; hence, the assumption of statistical independence is violated. This is because neighboring pixels have similar spectral characteristics; hence, observations are spatially clustered. This leads to the possibility of type I errors when statistical tests are performed at the pixel level with large samples. Noting this limitation, it is important to recognize that the statistical tests used in this analysis must not be considered as strict hypothesis tests of method comparison. Instead, these tests can be used to define relative effect sizes, differences in distributions, and methodological differences for LAI estimation approaches at a spatially continuous level of satellite observation. This is because it is a common approach within spatial analysis of remote sensing.

2.5. Machine Learning Validation and Feature Importance

Lacking field data on the Leaf Area Index (LAI) suitable for direct validation, a Random Forest regression technique is applied as a form of exploratory inter-method consistency analysis, rather than ground-truthing validation. The aim of this internal validation is to determine internal agreement, redundancy, as well as relative importance among the four empirical LAI models—EVI-based, SAVI-based, NDVI-Advanced, and NDVI-Basic methods—applied equally on the multi-temporal Sentinel-2 imagery.
To apply this approach, a consensus LAI for a given pixel and date of acquisition was considered to be the median of the four LAI estimates obtained from the index methods. The median was preferred as a measure of central tendency, as it is less sensitive to outliers and biases in methods. The RF algorithm was then applied to predict the consensus LAI using the four LAI estimates as predictors. Under this formulation, model performance reflects the internal coherence among methods and their ability to represent a shared vegetation signal, rather than accuracy with respect to true LAI.
When there are no independent measurements of leaf area index (LAI), the “consensus LAI” is used as a useful proxy for comparisons among methods, but not as a measure of truth. The median of the four estimates is used since it is less affected by the influence of extreme data points compared to the mean but captures the central tendency of the set of estimates under the same preprocessing. In this scheme, the Random Forest algorithm does not make a statement on the accuracy of the biophysical estimates but instead estimates the extent to which the estimates consistently express a common central value and provides a measure of redundancy among the correlated estimates. It is explicitly acknowledged that this approach involves an inherent circularity, as the consensus LAI target is derived from the same estimates used as predictors. As a result, high R2 values should not be interpreted as agreement with ground truth, but rather as an indicator of internal consistency and redundancy among correlated LAI estimation methods.
The dataset, consisting of all valid pixels over the acquisition dates, was randomly split into subsets for training (80%) and testing (20%) purposes. Five-fold cross-validation of the training dataset was performed to improve the generality of the results. For the RF algorithm, the number of decision trees used was 100 (n_estimators = 100), the maximum depth of the tree was 10 (max_depth = 10), and the minimum number of samples required to split an internal node was five (min_samples_split = 5).
On the hold-out test data, the performance of the models was evaluated using the Coefficient of Determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). These metrics quantify the degree to which the ensemble of LAI estimation methods coherently reproduces the consensus LAI and should be interpreted as indicators of inter-method agreement rather than absolute biophysical accuracy. High performance on these metrics simply reflects common spectral information in vegetation indices and their similar response to canopy and background variables, and hence, they do not serve as a validation of LAI accuracy.
Further, the intrinsic importance of the features of the RF model (Gini importance) was assessed to quantify the contribution of each method for estimating the LAI to the ensemble consensus LAI. This analysis provides insight into statistical contribution within the ensemble but does not imply that methods with higher importance are necessarily more physically accurate or biophysically realistic.

3. Results

3.1. Descriptive Statistics of LAI Estimates

The Leaf Area Index (LAI) was estimated for the small urban–agricultural polygon using four distinct empirical methods. A total of 5364 pixel-based observations (1341 pixels per method across three dates) were analyzed. The descriptive statistics for each method are summarized in Table 3. The substantially lower LAI values produced by NDVI-based methods are consistent with known saturation and soil sensitivity effects in sparse or mixed vegetation conditions.
The EVI-based method yielded the highest mean LAI (1.453) and the widest range (0.770–3.504), indicating high sensitivity to vegetation variability. The SAVI-based method produced a slightly lower but comparable mean (1.247) with a narrower range. In contrast, the two NDVI-derived methods resulted in substantially lower estimates, with the NDVI-Basic method showing the lowest mean (0.2920) and minimum value (0.000). All distributions showed positive skewness, suggesting a prevalence of moderate LAI values with a right-tailed spread of higher values. Results from Table 3 are presented on Figure 3 which shows distribution of LAI from four empirical approaches.

3.2. Statistical Comparison of Methods

A one-way Analysis of Variance (ANOVA) confirmed that the differences in mean LAI values produced by the four methods were statistically extremely significant (F(3, 16,088) = [9821.4], p < 0.000001). The effect size was very large (η2 = 0.647), indicating that the estimation method explained 66.7% of the variance in the LAI values, which is presented on Figure 4 Post hoc pairwise t-tests with Bonferroni correction confirmed that every pair of methods differed significantly from one another (all p < 0.000001), establishing that each method provides a statistically distinct LAI estimate for this landscape. Since the estimates of the LAI were obtained from spatially contiguous pixels, the observations were spatially autocorrelated; therefore, the assumption of independence between observations is not satisfied by the classical ANOVA procedure. Thus, the p-values are considered indicators of the presence of differences between the methods rather than measures of the probability of the null hypothesis. More importance is given to the size of the effects.

3.3. Temporal LAI Dynamics Across the Growing Season

The temporal analysis revealed clear seasonal growth patterns. The mean LAI (pooled across all four methods to represent a general trend) increased consistently from June to August 2023 (Table 4 and Figure 2).
The mean LAI increased steadily from 0.661 in June to 0.898 in July (+35.8%) and further to 1.102 in August, marking a cumulative increase of 66.7% from the baseline. The standard deviation also increased over time, indicating greater spatial variability in LAI as the growing season progressed.

3.4. Machine Learning Validation

The Random Forest regression model, trained on the four LAI estimates as features, demonstrated robust performance. Using an 80–20 train–test split and 5-fold cross-validation, the model achieved a coefficient of determination (R2) of 0.774, a Root Mean Square Error (RMSE) of 0.277, and a Mean Absolute Error (MAE) of 0.201 (Table 5). On Figure 5 there is presented relationship between observed consensus LAI values and Random Forest-predicted consensus LAI for the test dataset.
Feature importance analysis revealed that the NDVI-based LAI estimates (both Basic and Advanced) were the most influential predictors in the model, despite their lower absolute LAI values. Though the importance of the NDVI-based estimates was greater within the Random Forest model, this result should not be interpreted as an indicator of better biophysical performance or better accuracy at capturing the actual canopy morphology. Instead, the greater importance of the NDVI-based estimates is most likely due to their greater variance and strong soil background, illumination, and mixed-pixel influences, which increase the actual numerical range present within the dataset. From a machine learning perspective, the greater variance present within the dataset improves statistical discriminability and allows the model to better divide the space along the chosen features, increasing their apparent importance. But this variance is not necessarily indicative of better physical realism and may actually signal greater noise amplification and the presence of strong spectral responses unrelated to canopy morphology. This result highlights an important methodological difference between machine learning analysis and traditional statistical analysis: importance measures within ensemble models quantify the statistical utility of the chosen features within the specified framework, not the actual physical realism or accuracy of the features themselves. This means that features with greater vulnerability to the presence of confounding variables may show greater importance within machine learning models despite their underestimation of the LAI.

4. Discussion

This research corroborates and builds upon previous studies validating the empirical estimation of LAI based on vegetation indices, especially in heterogeneous and semi-arid landscapes. There has been extensive research validating the general tendency for LAI to be underestimated in areas dominated by sparse vegetation or soil cover, based on the saturation and influence of background reflectance, especially in NDVI-based estimates [1,6]. Conversely, the Enhanced Vegetation Index (EVI) has been validated to have greater sensitivity to vegetation structure, based on its additional terms for atmospheric resistance and soil adjustments, making it particularly appropriate in heterogeneous urban–agricultural landscapes [7,18,19]. The moderate performance of the SAVI in the current research also corroborates previous studies, highlighting its effectiveness in resisting soil noise while not being as responsive to the high spatial heterogeneity in vegetation cover. In this context, the current research aims to contextualize its results, confirming that the relative ranking of the performance in LAI estimation, as found in the current research, is not exceptional but rather follows well-established patterns in the theoretical and empirical behavior of vegetation indices in small, heterogeneous, and semi-arid landscapes.

4.1. Method Performance Comparison and Conceptual Interpretation

The comparative analysis revealed distinct and statistically significant differences in performance among the four LAI estimation methods within the small urban–agricultural polygon. The EVI-based method demonstrated clear superiority, yielding the highest mean LAI (1.453) and the widest value range. This robustness can be attributed to its inherent design, which incorporates atmospheric resistance (via the blue band) and a soil adjustment factor [7]. These features are particularly advantageous in the semi-arid, heterogeneous study area, where soil exposure and atmospheric haze are prevalent. The EVI’s ability to decouple canopy signals from background noise makes it the most reliable index for capturing true vegetation dynamics in such complex settings.
The SAVI-based method provides a reliable alternative, performing better than NDVI-based approaches due to its soil adjustment component (L = 0.5) [6]. Its intermediate results suggest that it is a viable tool, especially in areas where soil background is a dominant source of error, but atmospheric correction is less critical. Mean Leaf Area Index calculated from these four methods can be found on Figure 6.
In contrast, both NDVI-based methods (Basic and Advanced) consistently produced the lowest LAI estimates, with the basic linear model severely underestimating values. This underestimation is a well-documented limitation of NDVI in sparse vegetation and high-soil-influence environments, where its simple red–NIR ratio remains sensitive to soil brightness and atmospheric path radiance [2]. The paradox where NDVI-based LAI estimates were the most influential features in the Random Forest model, despite their lower accuracy, is noteworthy. This likely indicates that the NDVI-derived values provided the model with the greatest statistical variance to partition, but this variance may be driven more by soil and atmospheric artifacts than by true biophysical canopy variation. This finding underscores a critical methodological insight: high feature importance in a machine learning model does not equate to high estimation accuracy when the input features themselves are biased.

4.2. Temporal Dynamics and Ecological Relevance

The observed 66.7% increase in mean LAI from June to August 2023 strongly reflects the coupled influence of regional phenology and human management, which is presented on Figure 7. The increase aligns with the peak growing period for major crops in the Chirchik basin (e.g., cotton and maize) and is fundamentally enabled by the intensive irrigation regime that mitigates the semi-arid summer drought. The concurrent increase in LAI standard deviation over time suggests a divergence in vegetation conditions, likely due to differential crop growth stages, irrigation scheduling, or crop type variability across the small polygon. This highlights the utility of high-temporal-resolution monitoring for detecting within-season variability that is critical for precision agriculture and water management in water-scarce regions.

4.3. Validation Approach and Limitations of Internal Consistency

The Random Forest model provided a robust framework for the internal consistency validation of the methodological comparison, achieving strong metrics (R2 = 0.774, RMSE = 0.277). However, it is imperative to clarify that this validation assesses the relative agreement and predictability between the different index-derived LAI methods, not their absolute accuracy. The core limitation of this study is the absence of ground-truth LAI measurements, which precludes validation against a physical reality standard. Consequently, the “superior performance” of the EVI method is established relative to the other indices under the study’s specific conditions, not in an absolute field-verified sense. This reliance on internal cross-validation is a necessary but incomplete step, and the results should be interpreted with this caveat.

4.4. Methodological Limitations and the Mixed Pixel Challenge

Beyond the lack of ground truth, this study is subject to inherent limitations of medium-resolution remote sensing. The use of 10 m Sentinel-2 pixels in a small (0.045 km2), heterogeneous patch means that many pixels are mixed, containing proportions of vegetation, soil, and possibly built surfaces. While EVI and SAVI are designed to mitigate soil effects, they cannot resolve sub-pixel heterogeneity. This mixed-pixel effect fundamentally limits the attainable accuracy of any vegetation index when applied to fine-scale urban–agricultural mosaics and should be acknowledged as a key constraint on LAI estimation accuracy in such landscapes.

4.5. Practical Implications for Urban–Agricultural Management

For professionals managing small urban–agricultural interfaces in semi-arid conditions, the findings provide straightforward recommendations. The empirical approach using the Enhanced Vegetation Index (EVI) can be recommended for effective mapping of relative leaf area index (LAI) with a spatial pattern of vegetation health. The strong seasonal variations also emphasize the need for multi-temporal mapping over the growing period for irrigation and crop management purposes. Sentinel-2 imagery can be recommended as an effective data option for this requirement, despite its limitation in terms of the size of small landscape units.

4.6. Limitations and Scope

There are a number of limitations that must be considered when attempting to interpret and generalize the results of this study as they are presented. The first is that, because no ground-based measurements of LAI are available, it is impossible to externally validate the results based on physical measurements of canopy properties, and as a result, it is only possible to make a comparative validation of relative results rather than absolute physical accuracy. While this is a problem that is inherent in remote sensing studies of urban and peri-urban areas, particularly in areas with limited data availability, it must be kept in mind that the value of LAI, as determined in this study, should be considered relative to the vegetation index rather than a physical measurement of the canopy leaf area per unit of ground area. The second problem with this study is that it uses a relatively low-resolution Sentinel-2 satellite image with a resolution of 10 m on a very small spatial scale of 0.045 km2, resulting in a strong mixed pixel effect, in which pixels may contain varying amounts of vegetation, soil, and buildings, and although vegetation indices such as EVI and SAVI are relatively effective at reducing the effect of soil and non-vegetation background values, they are incapable of overcoming the limitations of sub-pixel spatial heterogeneity; as a result, it is impossible to achieve a high level of precision when attempting to estimate LAI values on a fine-grained spatial scale of urban–agricultural mosaics.
Thirdly, the empirical coefficients for estimating LAI used within this study were borrowed from the literature. This means that the coefficients may not have been specifically tailored to the types of crops grown within the region of study, their growth stages, irrigation levels, or soil types. This implies that there may be some degree of bias with regard to the absolute value of the LAI, especially because the coefficients may not reflect the local conditions of the region of study. Finally, the machine learning aspect of this research aimed to determine the consistency of various estimates of the LAI derived from the study of correlated variables. This means that the high performance of the machine learning model is not necessarily a measure of the accuracy of the structural composition of the canopy layer. Taken cumulatively, these weaknesses point to the fact that the main contribution of this study is the comparison and recommendation for relative LAI estimation in small, heterogeneous urban–agricultural areas. They also point to very specific areas for future studies, which include the incorporation of ground LAI estimates for LAI estimation and validation, the use of more refined optical or UAV imagery for reducing mixed pixel problems, and multispectral data fusion with 3D LAI estimates derived from LiDAR data for enhanced green infrastructure estimation.

5. Conclusions

This work offers an accurate comparative assessment of the empirical LAI retrieval approaches, including NDVI-Basic, NDVI-Advanced, SAVI, and EVI methods, in the context of a small (0.045 km2) heterogeneous urban–agricultural polygon in the Tashkent region, Uzbekistan, based on multi-temporal Sentinel-2 Level-2A imagery. Under the same processing environment, the differences between the approaches have been found to be statistically significant. The EVI-based index showed the highest mean LAI (1.453) with the highest dynamic range; it was more sensitive to vegetation changes during the growing season and more robust in the semi-arid environment with exposed soils and mixed pixels. On the other hand, the NDVI-based indices systematically underestimated the LAI, which is consistent with the saturation effect, while the SAVI-based index provided more stable LAI estimates, due to the soil correction term.
The analysis was able to capture strong seasonal dynamics of vegetation growth during the 2023 growing season, with an overall increase of 66.7% of mean LAI from June to August. This trend is consistent with the irrigation-induced growth of crops and represents the tight linkage between the phenology of vegetation growth and irrigation practices in a semi-arid urban agriculture environment. Such outcomes demonstrate the importance of high-temporal-resolution satellite data for the observation of small-scale dynamics of vegetation growth that are of high importance for agriculture and water resource management.
In the absence of any ground truth data for LAI, a Random Forest regression model was applied for inter-method consistency validation only. The model was able to reach an R-squared of 0.774, signifying strong consistency within the estimates of LAI derived from the indices. Moreover, feature importances also indicated that NDVI-derived estimates were given more prominence in the model predictions on account of their higher variance, thus confirming that model significance does not necessarily imply model superiority in terms of its ability to represent the real world.
While the proposed framework, which combines freely available Sentinel-2 data with widely available statistical and machine learning methods, provides a scalable and economic means of making relative LAI estimates in data-poor urban–agricultural settings, several limitations must be noted. The lack of LAI measurements from field observations limits the results to comparative assessments, and the 10 m pixel resolution of the Sentinel-2 data limits the level of detail in small, heterogeneous landscapes plagued by problems associated with mixed pixels. Despite these limitations, the results offer practical recommendations for application, suggesting the use of the EVI-based method as the most accurate available approach for making relative assessments of vegetation health in semi-arid urban–agricultural landscapes. Future research should instead focus on integrating in situ LAI data for calibration/validation, investigating higher-resolution optical or UAV imagery to alleviate the effects of mixed pixels, and integrating 3D vegetation variables from LiDAR data. These developments can help transform the comparative approach discussed in this paper into an operational, validated system for sustainable management of urban ecosystems and green infrastructure.

Author Contributions

Conceptualization, methodology, and software, B.M. and S.K.; validation, B.M. and N.T.; formal analysis and investigation, B.M.; writing—original draft preparation, B.M. and N.T.; writing—review and editing, B.M.; visualization and supervision, N.K.; funding acquisition, N.K. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by University North, under project “Comparing LiDAR data with publicly available datasets” grant number UNIN-TEH-25-1-13.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. Sentinel-2 data can be freely accessed via the Copernicus Open Access Hub. The Python (v3.12) and R (v4.4) codes written for data preprocessing, vegetation index computation, statistical analysis, and machine learning algorithm implementation will be archived in a public repository (such as Zenodo (https://zenodo.org/) or GitHub (https://github.com/)) after acceptance of the manuscript, and the DOI will be included in the published article.

Acknowledgments

The authors gratefully acknowledge the use of Sentinel-2 data provided by the European Space Agency (ESA) via the Copernicus Open Access Hub. The authors also thank the anonymous reviewers for their insightful comments and constructive suggestions, which helped improve the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
LAILeaf Area Index
NDVINormalized Difference Vegetation Index
SAVISoil-Adjusted Vegetation Index
EVIEnhanced Vegetation Index
VIVegetation Index
ESAEuropean Space Agency
MSIMultispectral Instrument
GISGeographic Information System
ANOVAAnalysis of Variance
RFRandom Forest
R2Coefficient of Determination
RMSERoot Mean Square Error
MAEMean Absolute Error
APCArticle Processing Charge

References

  1. Baret, F.; Guyot, G. Potentials and Limits of Vegetation Indices for LAI and APAR Assessment. Remote Sens. Environ. 1991, 35, 161–173. [Google Scholar] [CrossRef]
  2. Carlson, T.N.; Ripley, D.A. On the Relation between NDVI, Fractional Vegetation Cover, and Leaf Area Index. Remote Sens. Environ. 1997, 62, 241–252. [Google Scholar] [CrossRef]
  3. Neyns, R.; Canters, F. Mapping of Urban Vegetation with High-Resolution Remote Sensing: A Review. Remote Sens. 2022, 14, 1031. [Google Scholar] [CrossRef]
  4. The Urban Forest and Ecosystem Services. CSA News 2016, 61, 15–30. [CrossRef]
  5. Gómez-Baggethun, E.; Gren, Å.; Barton, D.N.; Langemeyer, J.; McPhearson, T.; O’Farrell, P.; Andersson, E.; Hamstead, Z.; Kremer, P. Urban Ecosystem Services. In Urbanization, Biodiversity and Ecosystem Services: Challenges and Opportunities; Springer: Dordrecht, The Netherlands, 2013; pp. 175–251. [Google Scholar]
  6. Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  7. Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
  8. He, J.; Zhang, N.; Su, X.; Lu, J.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Estimating Leaf Area Index with a New Vegetation Index Considering the Influence of Rice Panicles. Remote Sens. 2019, 11, 1809. [Google Scholar] [CrossRef]
  9. Huete, A.; Didan, K.; van Leeuwen, W.; Miura, T.; Glenn, E. MODIS Vegetation Indices. In Land Remote Sensing and Global Environmental Change; Springer: New York, NY, USA, 2011; pp. 579–602. [Google Scholar]
  10. Scheiber, L.; Zühlsdorff, V.; Nong, D.H.; Ngo, T.S.; Downes, N.K.; Bachofer, F.; Nguyen, H.Q.; Garschagen, M.; Reimuth, A. Monitoring Urban Green Space for Climate-Resilient Development in the Face of Rapid Urbanization: A Tale of Two Vietnamese Cities. Remote Sens. Appl. 2026, 41, 101820. [Google Scholar] [CrossRef]
  11. Ramdani, F. A Very High-Resolution Urban Green Space from the Fusion of Microsatellite, SAR, and MSI Images. Remote Sens. 2024, 16, 1366. [Google Scholar] [CrossRef]
  12. Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
  13. Matyukira, C.; Mhangara, P. Advances in Vegetation Mapping through Remote Sensing and Machine Learning Techniques: A Scientometric Review. Eur. J. Remote Sens. 2024, 57, 2422330. [Google Scholar] [CrossRef]
  14. Raufu, I.O. Exploring the Relationship between Remote Sensing-Based Vegetation Indices and Land Surface Temperature through Quantitative Analysis. J. Bulg. Geogr. Soc. 2024, 50, 95–112. [Google Scholar] [CrossRef]
  15. Darabi, H.; Haghighi, A.T.; Klöve, B.; Luoto, M. Remote Sensing of Vegetation Trends: A Review of Methodological Choices and Sources of Uncertainty. Remote Sens. Appl. 2025, 37, 101500. [Google Scholar] [CrossRef]
  16. Yu, W.; Zhou, W.; Dawa, Z.; Wang, J.; Qian, Y.; Wang, W. Quantifying Urban Vegetation Dynamics from a Process Perspective Using Temporally Dense Landsat Imagery. Remote Sens. 2021, 13, 3217. [Google Scholar] [CrossRef]
  17. Zhang, Y.; Wang, Y.; Ding, N. Spatial Effects of Landscape Patterns of Urban Patches with Different Vegetation Fractions on Urban Thermal Environment. Remote Sens. 2022, 14, 5684. [Google Scholar] [CrossRef]
  18. Zhang, H.; Yao, R.; Luo, Q.; Yang, Y. Estimating the Leaf Area Index of Urban Individual Trees Based on Actual Path Length. Build. Environ. 2023, 245, 110811. [Google Scholar] [CrossRef]
  19. Pamungkas, S. Analysis of Vegetation Index for Ndvi, Evi-2, and Savi for Mangrove Forest Density Using Google Earth Engine in Lembar Bay, Lombok Island. IOP Conf. Ser. Earth Environ. Sci. 2023, 1127, 012034. [Google Scholar] [CrossRef]
Figure 1. Location and extent of the study area: a small (0.045 km2) urban–agricultural polygon in the Tashkent region, Uzbekistan.
Figure 1. Location and extent of the study area: a small (0.045 km2) urban–agricultural polygon in the Tashkent region, Uzbekistan.
Land 15 00232 g001
Figure 2. Workflow of the methodological framework adopted in this study, including Sentinel-2 data acquisition and preprocessing, vegetation index computation, empirical LAI estimation, statistical comparison, and Random Forest-based inter-method consistency analysis.
Figure 2. Workflow of the methodological framework adopted in this study, including Sentinel-2 data acquisition and preprocessing, vegetation index computation, empirical LAI estimation, statistical comparison, and Random Forest-based inter-method consistency analysis.
Land 15 00232 g002
Figure 3. Distribution of Leaf Area Index (LAI) estimates from four empirical approaches: NDVI Basic, NDVI Advanced, SAVI-based, and EVI-based approaches. Their dynamics during the 2023 growing period are also given (June to August).
Figure 3. Distribution of Leaf Area Index (LAI) estimates from four empirical approaches: NDVI Basic, NDVI Advanced, SAVI-based, and EVI-based approaches. Their dynamics during the 2023 growing period are also given (June to August).
Land 15 00232 g003
Figure 4. Comparison of LAI estimates produced by the four empirical methods based on descriptive statistics and pairwise statistical tests.
Figure 4. Comparison of LAI estimates produced by the four empirical methods based on descriptive statistics and pairwise statistical tests.
Land 15 00232 g004
Figure 5. Relationship between observed consensus LAI values (median of four index-derived estimates) and Random Forest-predicted consensus LAI for the test dataset.
Figure 5. Relationship between observed consensus LAI values (median of four index-derived estimates) and Random Forest-predicted consensus LAI for the test dataset.
Land 15 00232 g005
Figure 6. Mean Leaf Area Index (LAI) values estimated by each empirical method across all acquisition dates.
Figure 6. Mean Leaf Area Index (LAI) values estimated by each empirical method across all acquisition dates.
Land 15 00232 g006
Figure 7. Seasonal dynamics of mean LAI from June to August 2023, shown for individual estimation methods and as an overall trend.
Figure 7. Seasonal dynamics of mean LAI from June to August 2023, shown for individual estimation methods and as an overall trend.
Land 15 00232 g007
Table 1. Sentinel-2 Level-2A imagery used in the study, including acquisition dates, cloud cover conditions, processing level, and spatial resolution.
Table 1. Sentinel-2 Level-2A imagery used in the study, including acquisition dates, cloud cover conditions, processing level, and spatial resolution.
DateCloud CoverProcessing LevelSpatial Resolution
23 June 2023<15%Level-2A10 m
13 July 2023<15%Level-2A10 m
17 August 2023<15%Level-2A10 m
Table 2. Vegetation indices.
Table 2. Vegetation indices.
MethodDescriptionFormula
NDVI-Basic MethodA simple linear model [16] L A I N D V I B = 3.0 × N D V I 0.5
NDVI-Advanced MethodA logarithmic transformation based on radiative transfer theory, more sensitive at higher LAI values [17] L A I N D V I A = l n ( 1 N D V I ) / 0.5
SAVI-Based MethodA linear model incorporating soil adjustment [6] L A I S A V I = 2.8 × S A V I + 0.2
EVI-Based MethodA linear model utilizing the atmospherically resistant EVI [7]. L A I E V I = 2.5 × E V I + 0.3
Table 3. Descriptive statistics of Leaf Area Index (LAI) estimates derived from four empirical methods across all pixels and acquisition dates, including measures of central tendency, dispersion, and distribution shape.
Table 3. Descriptive statistics of Leaf Area Index (LAI) estimates derived from four empirical methods across all pixels and acquisition dates, including measures of central tendency, dispersion, and distribution shape.
MethodMeanStd. Dev.MinMaxMedianSkewness
NDVI-Basic0.2920.2640.0001.2910.2451.1500
EVI-Based1.4530.4480.7703.5041.3631.061
SAVI-Based1.2470.3610.6262.7081.1990.804
NDVI-Advanced0.5880.2450.2141.8180.5431.205
Table 4. Seasonal variation in mean LAI values across the 2023 growing season (June–August), illustrating intra-seasonal vegetation dynamics and relative change over time.
Table 4. Seasonal variation in mean LAI values across the 2023 growing season (June–August), illustrating intra-seasonal vegetation dynamics and relative change over time.
PeriodMean LAIStd Dev% Change (from June)
June 20230.6610.469Baseline
July 20230.8980.594+35.8%
August 20231.1020.607+66.7%
Table 5. Performance metrics of the Random Forest inter-method consistency analysis, including coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), evaluated on the hold-out test dataset.
Table 5. Performance metrics of the Random Forest inter-method consistency analysis, including coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), evaluated on the hold-out test dataset.
MetricValueInterpretation
R20.774High explained variance
RMSE0.277Low prediction error
MAE0.201High accuracy
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mamadaliev, B.; Kranjčić, N.; Khamidjonov, S.; Teshaev, N. A Machine Learning-Validated Comparison of LAI Estimation Methods for Urban–Agricultural Vegetation Using Multi-Temporal Sentinel-2 Imagery in Tashkent, Uzbekistan. Land 2026, 15, 232. https://doi.org/10.3390/land15020232

AMA Style

Mamadaliev B, Kranjčić N, Khamidjonov S, Teshaev N. A Machine Learning-Validated Comparison of LAI Estimation Methods for Urban–Agricultural Vegetation Using Multi-Temporal Sentinel-2 Imagery in Tashkent, Uzbekistan. Land. 2026; 15(2):232. https://doi.org/10.3390/land15020232

Chicago/Turabian Style

Mamadaliev, Bunyod, Nikola Kranjčić, Sarvar Khamidjonov, and Nozimjon Teshaev. 2026. "A Machine Learning-Validated Comparison of LAI Estimation Methods for Urban–Agricultural Vegetation Using Multi-Temporal Sentinel-2 Imagery in Tashkent, Uzbekistan" Land 15, no. 2: 232. https://doi.org/10.3390/land15020232

APA Style

Mamadaliev, B., Kranjčić, N., Khamidjonov, S., & Teshaev, N. (2026). A Machine Learning-Validated Comparison of LAI Estimation Methods for Urban–Agricultural Vegetation Using Multi-Temporal Sentinel-2 Imagery in Tashkent, Uzbekistan. Land, 15(2), 232. https://doi.org/10.3390/land15020232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop