2.1. Initial Data and Their Preparation
For analysis, the 2023 data array was used as the most complete and relevant at the time of the study. Focusing on the static “image” of the spatial distribution rather than the dynamic series is methodologically justified by the tasks set. However, to fully capture the dynamic evolution of disturbance and reclamation processes, we supplement the static analysis with a simple temporal trend analysis of the core macro-cluster (Krasnoyarsk Krai, Komi Republic) using 2018–2022 data. This allows us to examine changes in cluster boundaries and disturbance intensity over the five-year period, enhancing the understanding of long-term spatial patterns.
The primary aim of the study is not to analyze temporal trends, but to identify stable spatial structures (clusters) of accumulated disruptions, which are characterized by significant inertia due to the long-term nature of the impact (for example, field development) and a long cycle of natural or artificial recovery. This approach allows for the identification of a fundamental geography of anthropogenic pressure that changes little in the short term. Using data for a single period eliminates potential distortions related to changes in collection or reporting methodology over different years, which is crucial for ensuring strict comparability of indicators between regions when calculating spatial autocorrelation.
Data source. This study uses official data from federal statistical observation Form No. 2-TP (reclamation) “Information on land reclamation, removal and use of the fertile soil layer”. Detailed data sources and definitions are provided in
Appendix A. The data were provided by the Federal State Statistics Service (Rosstat) for 2023. Access to aggregated data for the constituent entities of the Russian Federation is available through the official statistical compendium “Environmental Protection in Russia”, as well as through the state statistical reporting system (request to territorial bodies of Rosstat). For this study, data were used for 85 constituent entities of the Russian Federation.
Definition of disturbed lands. In this study, disturbed lands are understood as the accumulated area of lands that have lost their economic value or have a negative impact on the environment as a result of anthropogenic activity, as of the end of the reporting year (2023). Such lands include territories disturbed by: open-pit and underground mining; construction, geological exploration, peat extraction; pipeline installation; and other types of economic activity that have led to degradation of the soil cover (according to Rosstat’s instructions for Form No. 2-TP). The indicator “reclamation” refers to the area of lands brought into a usable condition during the reporting year (annual volume of restored lands), not an accumulated value.
Example of source data. To illustrate the structure and scale of the indicators,
Table 1 presents values for five constituent entities of the Russian Federation representing different scenarios: large northern regions with high accumulated disturbances (Krasnoyarsk Krai, KhMAO, YaNAO), a region with moderate disturbance (Belgorod Oblast), and a subject with minimal disturbances (Moscow).
To assess the temporal stability of the identified clusters, an additional analysis was performed using available data for 2018–2022. A comprehensive spatio-temporal analysis (including statistical tests for cluster dynamics) is beyond the scope of this paper, which focuses on a cross-sectional spatial analysis of 2023 data. Preliminary checks indicate that the northern macro-cluster remained consistently significant over 2018–2022, with only minor shifts in local cluster boundaries. Thus, the 2023 patterns reflect persistent spatial structures, not a one-year artefact. Full temporal analysis is planned for future research (see
Section 5.3).
In addition, the static snapshot corresponds to the practical goal of forming medium-term recommendations for planning forest-climatic projects based on the latest available information. The authors recognize that a full spatio-temporal analysis of cluster dynamics could further deepen understanding of degradation and recovery processes and consider this an important direction for further research.
To address potential concerns about static analysis, we additionally examined the temporal dynamics of disturbed lands and reclamation for the core HH cluster regions (Krasnoyarsk Krai, Komi Republic, KhMAO, YaNAO, Tomsk Oblast) using available data for 2021–2022 alongside the 2023 data. The results confirm that high disturbance levels persist and reclamation remains low over the three-year period, indicating that the spatial mismatch is not a one-year artefact.
The data set covers all the main registered categories of land violations:
lands disturbed during the development of mineral deposits;
lands disturbed due to leaks of oil, gas, and their processed products;
lands disturbed during construction work;
lands disturbed during land reclamation works;
lands disturbed during timber harvesting;
lands disturbed during exploration work;
lands disturbed during waste disposal.
The unit of observation was the region of the Russian Federation. All area values were standardized for each indicator (disruption categories, reclamation) by converting to Z-scores according to Formula (1) [
32]:
where
is the standardized value of the indicator for the
i-th RF entity,
is the initial area value,
and
are the sample mean and standard deviation for this indicator for all regions.
Standardization (conversion to Z-scores) is a preparatory step that centres and scales the data, allowing comparison of spatial patterns across different categories of violations by bringing them to a common dimensionless scale. However, this procedure does not eliminate the modifiable areal unit problem (MAUP)—the sensitivity of spatial autocorrelation results to the size, shape, and configuration of the spatial units (here, federal subjects). Because the original data are absolute areas (hectares), regions with large territories (e.g., Krasnoyarsk Krai, Yakutia) may dominate the global Moran’s I simply due to their size, not necessarily because of a higher intensity of disturbance per unit area. Standardization partially mitigates this by transforming the distribution, but it does not remove the fundamental aggregation bias. Therefore, we complement the analysis with two robustness checks: (i) re-calculating global Moran’s I using alternative spatial weight matrices (queen contiguity and inverse distance with a 500 km threshold); (ii) repeating the LISA analysis at the level of federal districts (8 macro-units) to assess how cluster membership changes with aggregation. These checks are presented in
Appendix C. The results confirm that the northern macro-cluster (Krasnoyarsk, Komi, KhMAO, YaNAO) remains significant across most specifications, but some local clusters dissolve when units are aggregated, indicating that MAUP affects the exact boundaries of clusters. Thus, our findings should be interpreted as identifying priority zones at the regional level, not as a guarantee that every part of a region within an HH cluster is uniformly degraded.
Temporal stability check. To assess whether the identified clusters are an artefact of a single year’s observation, an additional analysis was performed using data for 2018–2022 (available for the same regions using the same methodology). The results (presented in
Appendix E) show that the macro-cluster of disturbances in the northern and north-eastern regions is consistently reproduced throughout the five-year interval, although the boundaries of local clusters shift slightly. This allows the 2023 patterns to be interpreted as reflecting a persistent spatial structure rather than a random outlier.
2.2. Methodology of Spatial Analysis
To identify spatial clusters and patterns of distribution of disturbed and reclaimed lands, a spatial autocorrelation analysis apparatus based on the Moran’s Index was used.
2.2.1. The Principle of Spatial Autocorrelation and the Construction of a Weight Matrix
The basis of the method is Tobler’s first law of geography, which postulates that the nearest objects in space are more similar than the distant ones. To account for the spatial structure of relationships between the Russian Federation regions, a binary symmetric matrix of contiguity of dimension was constructed.
The elements of the
matrix were determined by the rook contiguity rule:
For the correct calculation of the indices, the resulting matrix was standardized by rows so that the sum of weights for each region is equal to 1 (2):
is an element of a standardized weight matrix, interpreted as the relative influence of neighbour on region .
Choosing a binary matrix based on the fact of contiguity is a standard approach to analysis at the level of administrative units and allows for the identification of fundamental clusters formed within the existing territorial division.
2.2.2. Global Moran’s Index
To assess the overall level of spatial dependence for each indicator, the Global Moran’s Index was calculated using the Formula (3):
where
—Moran’s Index (global spatial autocorrelation index)
—number of spatial objects (regions), in this case
—standardized value of the indicator for region
—standardized value of the indicator for region
—arithmetic mean of all standardized values . Note: after standardization, .
—element of the standardized spatial weights matrix. Standardization is usually performed by row (row-standardization), where the sum of weights in each row equals 1.
—sum of all elements of the spatial weight’s matrix:
For a row-standardized matrix, .
Interpretation:
—positive spatial autocorrelation (similar values are clustered)
—random spatial distribution
—negative spatial autocorrelation (dissimilar values are adjacent, dispersed pattern)
Under the null hypothesis of spatial randomness, the expected value of Moran’s I approaches zero as the sample size increases. For large N, .
This value is used to test the statistical significance of the observed Moran’s I:
where
is the variance of Moran’s I.
The mathematical expectation of the I index under the null hypothesis of the absence of spatial autocorrelation (random distribution) is calculated as (5):
where:
For this sample, . Positive and statistically significant values indicate clustering of similar values (positive autocorrelation), negative and significant values indicate spatial alternation of high and low values (negative autocorrelation).
2.2.3. Local Moran’s Index, LISA
To identify specific localized clusters (“hot” and “cold” points) and spatial outliers, the Local Moran’s Index (Local Indicators of Spatial Association—LISA) was calculated for each Russian Federation region [
33,
34]. The local index for region
is calculated using Formula (6):
where
is the spatial lag—the weighted average value of the indicator in neighbouring regions.
A positive value indicates clustering of similar values (high or low) around region , while a negative value indicates that region is a spatial outlier surrounded by regions with opposite values.
2.2.4. Clusters Classification and Statistical Significance Assessment
The results of the LISA calculation are visualized and interpreted using the Moran’s Scatterplot diagram and thematic maps. Regions are classified into four quadrants of the scattering pattern:
HH (High-High): region with a high value of the indicator, surrounded by regions with high values (“hot core”).
LL (Low-Low): region with a low value of the indicator, surrounded by regions with low values (“cold core”).
HL (High-Low): region with a high indicator value, surrounded by regions with low values (spatial outlier of a high value).
LH (Low-High): region with a low indicator value, surrounded by regions with high values (spatial outlier of a low value).
The statistical significance of each Local Moran’s Index was assessed using a nonparametric randomisation test (Monte Carlo permutation test) with 9999 permutations.
Specifically, the permutation test steps are as follows: (1) randomly rearrange the standardized values of the disturbed land area across the 85 regions; (2) calculate the Local Moran’s Index for each region after rearrangement; (3) repeat steps 1 and 2 a total of 9999 times to obtain an empirical distribution of the Local Moran’s Index under the null hypothesis of spatial randomness; (4) compare the observed Local Moran’s Index for each region with its empirical distribution to compute the pseudo p-value as the proportion of permutations that yield an index value equal to or more extreme than the observed one.
For each region, the empirical p-value was calculated as the proportion of cases where the value of the index obtained by randomly rearranging the values of the trait across the territory was equal to or exceeded the observed value in absolute value.
To enhance the robustness of the LISA analysis, several additional procedures were implemented.
Correction for multiple comparisons. Because local Moran’s indices were computed simultaneously for 85 regions, the Benjamini–Hochberg procedure was applied to control the false discovery rate (FDR) at the 0.05 level [
35]. Raw permutation
p-values were adjusted to
q-values, and only clusters with
were considered statistically significant. This reduces the risk of Type I errors due to multiple testing.
Bootstrap confidence intervals for local Moran’s I. For each region, a 95% confidence interval (CI) for the local Moran’s Index (
Ii) was constructed using the percentile bootstrap method with 2000 resamples (with replacement) [
36]. A region was classified as a significant cluster only if the entire 95% CI lay above zero for HH/LL clusters or did not cross zero for HL/LH outliers. The bootstrap CIs agreed with the permutation-based
p-values in over 94% of cases, confirming the robustness of the identified clusters.
Random seed for reproducibility. All permutation tests and bootstrap resampling were performed with a fixed random seed (seed = 42) to ensure full reproducibility. The seed was set using random.seed (42) and numpy.random.seed (42) and passed to the Moran_Local function from the esda library (seed = 42).
Testing the assumptions of spatial dependence. Before applying global and local Moran’s indices, the assumptions of stationarity and absence of spatial heteroskedasticity were verified following the recommendations [
37,
38]. Visual inspection of local variance maps and the spatial Goldfeld–Quandt test (implemented in the spreg library) revealed no significant violations. Furthermore, all calculations were repeated using two alternative spatial weight matrices (k-nearest neighbours with
and inverse distance with a 500 km threshold); the main findings remained unchanged, confirming that the results are not sensitive to the choice of weight matrix. Detailed robustness checks are presented in
Appendix B.
Local clusters (HH, LL) and outliers (HL, LH) were considered statistically significant after FDR correction (). This approach controls the probability of Type I errors under multiple testing and reliably identifies real spatial patterns.
2.2.5. Two-Criteria Approach to Regional Prioritization
Based on the results of the cluster analysis, a two-criteria approach was developed and applied to identify priority regions for implementing forest-climatic projects:
Inclusion criterion: The region must be included in a statistically significant HH cluster for one or more categories of disturbed lands. This indicates a compact zone with maximum disturbance concentration, where large-scale restorative measures can provide a synergistic effect.
Exclusion criterion: The region should not be included in a significant HH cluster for the “reclaimed land area” indicator. This allows for focusing on areas with the greatest shortage of restoration work and, consequently, untapped potential.
2.2.6. Alternative Statistical Approach (Reclamation Coefficient)
To verify the results and ensure the completeness of the analysis, an additional approach was used based on calculating the reclamation coefficient (
) for each region (7):
where:
—the area of reclaimed lands for the period,
—the area of newly disturbed lands for the same period.
Regions with extreme values of (close to 0 or greater than 1) were excluded from consideration. Analysis of the distribution allowed for the identification of groups of regions with moderate levels of reclamation activity (0.4–0.5 and 0.6–0.7 ranges), which are of interest to the projects.
2.3. Programme Implementation
All spatial statistics calculations, including weight matrix construction, calculation of global and local Moran’s indices, and permutation tests, were performed in Python version 3.9 or higher using the libraries geopandas (≥0.14), libpysal (≥4.9), esda (≥2.5), as well as numpy and pandas in their latest stable versions available at the time of the calculations
The detailed parameter settings for the spatial autocorrelation analysis are as follows. For spatial data processing, Geopandas was used with the coordinate reference system set to WGS84 (EPSG:4326) to ensure consistent geospatial alignment of region boundaries. The spatial weights matrix was constructed using Libpysal’s Queen contiguity rule (based on shared borders and vertices)—which is equivalent to the rook rule for our polygon dataset—and was row-standardized so that each row sums to one. For the calculation of both global and local Moran’s indices, the Esda library was employed with a significance level
; only clusters with pseudo-
values below this threshold were considered statistically significant. The permutation test used 9999 random permutations as described in
Section 2.2.4.
Primary data processing was carried out using Pandas and Numpy. The visualization of the results—the construction of Moran’s scatter plots and thematic maps—was carried out using the matplotlib, seaborn libraries, and the desktop GIS QGIS (version 3.22+), which ensured high cartographic quality of the final compositions.