Change Detection of Amazonian Alluvial Gold Mining Using Deep Learning and Sentinel-2 Imagery

.


Introduction
Artisanal and small-scale gold mining (ASGM) is an emerging threat to the conservation and preservation of tropical riverine systems across the planet [1,2].This method of mining involves the removal of aboveground biomass and the processing of alluvial soil sediments for the retrieval of minute historical deposits of gold particles.ASGM typically involves operations at a much broader spatial scale than pit mining, as the concentration of gold particles is comparatively low in alluvial fans and historical river channels [3].As a result, ASGM is generally associated with land cover/land use (LCLU) change that can encompass large areas, including the clearing of primary tropical rainforest.
The presence of this type of mining in small pockets of the Amazon Basin is not new.However, the expansion of ASGM as a driver of land-cover change throughout Amazonia and in other tropical ecosystems has increased remarkably over the past decade.For instance, in the Peruvian department of Madre de Dios, ASGM was responsible for the removal of over 120,000 ha of primary tropical forest from 1984 to 2017 [4].ASGM has also taken hold outside the Amazon, including in Nigeria [5], Ghana [6], Laos [7], and Indonesia [8].The intensification of ASGM has led to profound impacts on river biogeochemistry [1], human health [9], and conserved areas [4], making it a significant driver of land-use change in tropical landscapes and riverine systems.Water is essential for the mining process, and shallow tropical water tables quickly fill any excavation.As a result, entire landscapes that were once primary forests have been converted to a mixture of ponds and bare earth, creating novel hydroscapes and greatly changing restoration potential [10] (Figure 1).As ASGM has intensified globally, monitoring efforts to detect mining activity have been of significant interest for conservation and governance purposes.Current efforts to monitor ASGM landscapes, including the presence of mining ponds and water bodies left over from sediment extraction, generally make use of satellite-based remotely sensed imagery (e.g., [4,11]).This work often relies on indices that compare reflectance band data from these sensors to categorize the land surface into broad groups, a technique that is also used for monitoring small water bodies [12][13][14].However, these methods generally work on a pixel-basis, and do not keep track of temporal change across time series.
Recently, developments in deep learning have led to an increased capacity for monitoring LCLU change more discretely, allowing for segmentation and labeling of individual features or objects within digital imagery.Among these methods are the use of both convolutional neural networks (CNN) and recurrent neural networks (RNN).A convolutional neural network (CNN) is a multilayer neural network that is inspired by the model of the primate visual system [15] and is utilized for learning features [16] and classification problems [17,18].Specifically, CNN relies on two-dimensional spatial contexts within im-agery data to generate edges and identify features.As a result, CNN-based deep learning is widely used for feature extraction uses such as semantic segmentation [19], landslide detection [20], object detection [21,22], and change detection [23].RNNs have the capacity to re-apply past weights to layers in the neural network.By remembering the spatial features over time, RNNs utilize temporal contexts and functionality with time-series data.RNNs have been used for monitoring and estimating land-cover change [24,25] and crop identification [26,27].When these two types of neural networks are combined into a singular network (ReCNN; [28]), time-series multispectral data can be analyzed in a way that detects features as well as changes in conditions of these features over time.
While it may appear that deep learning only provides a more detailed estimate of land-cover change when compared to conventional techniques, these new methods may be transformative in guiding policy and mitigation measures.For instance, in the context of ASGM, general methods using spectral indices alone describe the area of primary tropical forest biomass that has been converted [4,29] as well as the presence of new mining ponds [10].These mining ponds, or lagoons, are generally 3-4 m deep water bodies produced as sediment is excavated for processing.The excavated areas are filled with water via hoses and pumps to hasten the erosion and liquefaction of the soil.When local mining abates, sediment concentrations in the pond water column decrease while phytoplankton and algae increase in still water [1,30].Understanding such changes provides insight into the effectiveness of mining and conservation policy across a landscape [30].Ultimately, deep learning methods that provide time-series data on the reflectance of individual features on a landscape may provide great utility for land-use change science and analysis.
In this work, we show how deep learning can be used to more thoroughly evaluate object-oriented LCLU change via satellite imagery.To do so, we utilize ReCNN, a combined form of CNN and RNN, into a singular network [28] to detect and categorize the changes in mining ponds created by ASGM activities.This ReCNN is compared with a semi-supervised model, support vector machines with smoothed total variation, SVM-STV.Specifically, we examine the outcomes from these models, as well as a number of labeling methods, to understand the applicability of these techniques to land-cover changes associated with ASGM.We focus on mined areas in the Peruvian department of Madre de Dios, a global hotspot of ASGM activity.We then transfer the model to other international ASGM sites to showcase its utility.Primary contributions of the study are:

•
The creation of an open-source labeled dataset of water body change pertaining to ASGM that can be used for training and consistent evaluation of algorithm performance; • An evaluation of labeling methods and approaches for use with supervised model construction;

•
An assessment of supervised and semi-supervised methods in the context of detecting and characterizing mining ponds from ASGM activity; • A test of the best-performing models at a selection of out-of-sample international ASGM sites to examine universal model utility.

ASGM Ponds Dataset and Change Characterization
Our main study region is located within the Peruvian department of Madre de Dios (MDD), a globally significant hotspot of ASGM activity.We selected 16 distinct smaller region samples (~70 km 2 each) of interest within MDD to highlight locations that had experienced mining pond surface-area increases as well as notable deforestation (Figure 2; Table A1 in Appendix A).These regions were selected for two main reasons: firstly, the regions spanned a gradient of significant mining intensity, techniques, and policy enforcement over the last 15 years.Secondly, the regions were shaped so as to maximize the number of pixels undergoing change between bi-temporal images, thereby providing a more thorough test of the models.A total selection of sixteen regions allowed for a fully representative sample of sites with these two considerations in mind.
We acquired Sentinel-2 top-of-atmosphere reflectance data for these 16 regions via the Google Earth Engine platform.The Sentinel-2 satellite constellation [31] was developed for monitoring variability in LCLU conditions at frequent revisit times (5 days at the equator) and consists of 13 multispectral channels ranging from ultra-blue to shortwave infrared with pixel resolutions between 10 and 60 m GSD.Sentinel-2 data is widely used to assess LCLU change in the context of surface water [32,33].We selected data from two different years (2019 and 2021) with very low cloud coverage to showcase periods in which significant land-use change had occurred.We removed the influence of atmospheric effects by histogram matching of corresponding images of the same region and used Sentinel-2 metadata about cloud cover to remove any residual clouds on the images.LCLU changes due to alluvial gold mining occur across different parts of the world, so it is crucial that change-detection algorithms generalize from one geographical region to another.Thus, we included an out-of-sample testing dataset containing instances of similar alluvial gold mining in Indonesia, Myanmar, and Venezuela that are of similar intensity to that in MDD (Table A2 in Appendix A).
For the purpose of generating meaningful labeled data, we defined three different pond states in relation to the recency of mining: These basic categories can provide useful information regarding ongoing mining activities, such as intensification, cessation, and the effect of governance [1,4].
A subgroup of individuals in the research group with expertise in the characterization of alluvial gold mining manually segmented and labeled each individual pond in the dataset.Ponds were segmented by manually tracing their edges, and pond status was determined from side-by-side visual observation of the RGB and shortwave-infrared (SW) with GB composite images for each region, see Figure 3a-c.These band combinations were chosen specifically to help discriminate between active sites, in which sediment highly reflects in the red band, and inactive sites, in which photosynthetic material is present and influences the shortwave infrared reflectance (Figure 3a).For consistency, we calculated the color index C idx = (green − red)/(green + red) distributions of pond pixels and chose thresholds of 0 and 0.15 to select ponds in a transition state.

Modeling Approaches
We considered two main approaches for modeling and quantifying change in residual ponds: a supervised deep learning method based on ReCNN [28] and a semi-supervised method involving a support vector machine and smoothed total variation regularizer [34].

Supervised Deep Learning Approach
We extended the ReCNN model of Mou et al. [28], originally designed to detect LCLU-type changes in urban areas using satellite imagery, for the detection of large and subtle changes relative to water bodies.First, we augmented the ReCNN model to include a second LSTM plus dropout layer between the original two LSTM layers (Figure 4) to capture subtle pond-state changes.Second, we modified the input layer to receive two temporal images separately instead of two concatenated images as is done in other studies (e.g., [35,36]).We refer to this implementation as extended ReCNN (E-ReCNN) throughout the remainder of this paper.

Semi-Supervised Learning Approach
Unsupervised and semi-supervised learning methods are widely applied in remote sensing applications involving small datasets and limited access to high-performance computing equipment.SVMs are powerful semi-supervised approaches that have been used to detect LCLU change utilizing spectral information of each pixel separately [37][38][39].A recent SVM-STV approach by Chan et al. [34] also utilized spatial information contained across image regions.We modified this approach to include a lifting option for multispectral images.Lifting is a preprocessing step that can help aid with the segmentation of RGB images through the use of color spaces and additional features [40][41][42][43].We combined the RGB and La*b* color spaces in the images as features and then performed segmentation to reduce the effects of high correlation in one color space [43].
Figure 5 illustrates the two main steps in SVM-STV.In the first step, we formed the v feature vectors from the difference in the bi-temporal images.Then we used a pixel-wise ν-support vector classifier (ν-SVC) with a radial basis function kernel to find a hyperplane maximizing the margins between each pair of classes, using a one-against-one strategy, and to assign each pixel a vector of probabilities belonging to each class [44,45].The difference in La*b* color space between bi-temporal images is included in feature v if the lifting option is enabled.In the second step, a smoothed total variation (STV) regularizer smooths the probability vector and consequently the classification map.

Statistical Approaches, Training, and Operation
In order to understand the performance of the proposed approaches and the impact of spectral information and image preprocessing, we designed a number of test-train experiments across the 16 numbered regions within Madre de Dios (Figure 2).Additionally, to examine the generalizability of the approaches to ASGM sites in other locations in the tropics, we constructed a set of out-of-sample testing regions in Venezuela, Indonesia, and Myanmar.
Since the supervised and semi-supervised modeling approaches used different quantities and distributions of labels, we used slightly different training approaches for each model.For the E-ReCNN model, we used a leave-one-region-out cross-validation approach.This method is often used for classification in medical imaging (e.g., leave-one-patientout) [18,46] to account for class imbalance and region information.Specifically, we left one of the sixteen MDD regions out for testing and used the remaining fifteen regions for training and validation.We iterated this process for each individual region, allowing each region to serve as a testing region once.For each iteration, one region's image was selected as a test, and the remaining fifteen regions' images were used for training (70% of all patches) and validation (30% of all patches).Because we were examining the influence of the number of channels included in the model, this process was repeated for each multispectral image in the 3, 6, and 10-Channel image sets.Nesterov Adam [47], an improved Adam optimizer [48], was used to accelerate adaptive moment estimation and the convergence of both the Adam and stochastic gradient descent (SGD).The parameters of the model producing the best average predictive results are listed in Appendix A Table A2.All testing and training using E-ReCNN were performed on the Wake Forest University DEAC HPC Cluster [49] (Appendix A Table A3).
To train the SVM-STV semi-supervised model, we first trained the ν-SVC and then performed denoising on the probability map that ν-SVC produces.In the context of semisupervised learning, less than 1% of labels were randomly selected for training, whereas over 99% of the labels were unknown.Thus, in the training process of each region, instead of including all the pixels into the ν-SVC, we only incorporated a subset of randomly chosen labeled points from each region.Therefore, for each of the 16 MDD regions, we first specified the number of labeled pixels per class (N k ) for training the model.Next, we used the preprocessed randomly selected N k * K labeled pixels to train the ν-SVC with five-fold cross-validation, where K was the number of classes.The trained ν-SVC was then applied to predict the probability tensor.Finally, the denoising parameters were tuned based on the probability maps of each region.The training procedure for SVM-STV was computationally feasible and had a rapid training time as it only used a small portion of randomly selected labeled data (0.004-0.2%).All testing and training of the SVM-STV method were conducted in the same environment: Intel ® Core™ i7-10875H CPU @ 2.30GHz, 8 cores, 64 GB RAM, Windows 64-bit system, and MATLAB R2021a.
To examine the influence of spectral information on method performance, we constructed three sets of spectral images with varying numbers of spectral bands chosen specifically for application to water and LCLU change:

•
A three-band set of images containing red, green, and blue bands (RGB); • A six-band set of images containing red, green, blue, NIR, SWIR1, and SWIR2; • A 10-band set of images containing red, green, blue, NIR, SWIR-1, SWIR-2, ultra-blue, and bands 5, 6, and 7, which correspond to the vegetation red edge.
To evaluate the overall performance of the methods, we used three metrics: Cohen's Kappa coefficient [50], the Jaccard index [51], and the F-1 score [52,53].Cohen's Kappa coefficient provides a measure of consistency and reliability in classification tasks.The Jaccard index, also referred to as the intersection over union, measures the overlap between labels and predictions, emphasizing true positives over true negatives.The F1 measure is the harmonic mean between precision and recall and does not take true negatives into account.As a result, the changed-area accuracy is not affected by the 'no change' area accuracy, which is high because of the number of pixels.We did not present accuracy scores, as these can be misleadingly high (as seen in Appendix A Table A4 in the Accuracy column) and not good indicators relative to other metrics due to severe class imbalance.

Results
In this section, we present the results of multi-class change detection on AGM ponds in multispectral images that were obtained from focal (MDD) and out-of-sample prediction regions.Results from change detection analyses using binary classes (change/no change) can be found in Appendix A, Tables A6 and A7.
The overall performance of the two approaches, across testing regions and using all testing sets with respect to the number of channels, ranged from 0.19 (± 0.06) to 0.92 (± 0.04).
The inclusion of increased spectral information (channels) generally increased performance.Among all experimental settings, the greatest average result of multi-class change classification by E-ReCNN was a Cohen's Kappa of 0.92 (± 0.04), a Jaccard value of 0.88 (± 0.07), and an F1 of 0.88 (± 0.05) for histogram-matched 6-Channel set images (Accuracy value 0.99 ((± 0.04), as seen in Appendix A Table A4).In contrast, the greatest average result of a multi-class change by SVM-STV was a Cohen's Kappa value of 0.63 (± 0.07), a Jaccard value of 0.56 (± 0.06), and an F1 of 0.67 (± 0.06) for original (not preprocessed) 10-Channel set images.These results were achieved on images from the MDD region training dataset.The MDD-trained E-ReCNN approach applied to out-of-sample regions (Figure 6, right) performed similarly to the results obtained in the focal MDD region (Figure 6, left), demonstrating the generalization of E-ReCNN across different spatial regions.The SVM-STV approach performed less well on out-of-sample prediction, decreasing by 25% on average.Overall, the E-ReCNN model performance using the 6-channel histogram-matched image sets from the 16 MDD regions resulted in outcomes with high levels of precision, recall, and F1-score (Figure 7, left).Model F1-scores for 'no change' and 'water existence' classes were 0.99 and 0.96, respectively.F1-scores for 'increase' and 'decrease' classes were slightly lower than 'no change' and 'water existence' classes, although the total quantity of labeled pixels for those two classes was notably lower.This pattern of F1-scores across classes was also seen in the out-of-sample regions (Figure 7, right).The total number of classified pixels in these regions was significantly lower, and F1 values for the 'decrease' and 'increase' classes were 0.56 and 0.57, respectively.The performance metrics not biased by smaller sample sizes, including Cohen's Kappa and Jaccard coefficients, were higher than 0.9 for both the MDD regions as well as international out-of-sample regions.In contrast, the SVM-STV model results for 'water existence', 'increase', and 'decrease' classes were less accurate than the E-ReCNN model results, as shown in Figure 8 for the MDD regions (left) and out-of-sample regions (right).F1-scores for the MDD region 'increase' and 'decrease' classes were lower using this semi-supervised method than the out-of-sample regions, as modeled by E-ReCNN.Model results for the out-of-sample regions using SVM-STV were very low with respect to F1-scores, below 0.15 for the 'increase' and 'decrease' classes.
Applying the E-ReCNN and SVM-STV models on image sets with a variety of spectral channels provided inference regarding how each channel of Sentinel-2 influenced model behavior.The 3-channel RGB image resulted in roughly equivalent F-1 scores for both the E-ReCNN model and the SVM-STV model across the MDD regions (Figure 9).While the addition of near-infrared and shortwave infrared channels (1 and 2), which are often used to define water surfaces with the help of a water index (6 channel image), improved F-1 scores for both models, further including red edge channels (10 channel image), resulted in no additional improvement.Notably, E-ReCNN results appeared to be more accurate than SVM-STV for both the 6-channel and 10-channel image sets.

Discussion
In the context of land-use change, particularly change associated with ASGM, understanding how features across a landscape change in size and reflectance can provide critically important information for conservation and environmental policy enforcement.We show that the extension of an existing ReCNN detects multi-temporal change across landscape features when compared to an existing semi-supervised model (SVM-STV).E-ReCNN outperformed SVM-STV and unsupervised methods considerably for both the focal region in Madre de Dios as well as out-of-sample test regions with respect to F1, precision, and recall.Notably, E-ReCNN generated greater F1, precision, and recall values for the detection of water occurrence and the multi-temporal change in spectral response for each pond feature.Estimates of precision and recall for pond sediment decreased (82.8% and 86.1%) and increased (70.6% and 87.3%) within the MDD, showing that this method is capable of generating multi-temporal feature-based change maps.These results provide evidence that this method has wide applicability to the field of environmental change detection and monitoring.
One ongoing challenge in the use of satellite data for change detection related to how atmospheric conditions can cause complications when attempting to document finescale feature-oriented change.Although the major remote-sensing platforms, such as Landsat, Sentinel, and MODIS, are routinely processed and corrected via well-established and formalized techniques [54][55][56][57], persistent variability in surface reflectance from image to image requires careful consideration when monitoring temporal trends.We tested a number of data preprocessing approaches to understand how these challenges could be addressed and to understand how steps can be taken to improve machine learning model results.We found that histogram matching, which has primarily been used in remote sensing to denoise atmospheric effects on image mosaics [58,59] and recently in change detection [60][61][62], improved outcomes for the supervised model, E-ReCNN.In contrast, with the Lab color space variables in the semi-supervised model, SVM-STV produced the most accurate results.Some remote-sensing studies of surface water successfully identified patterns and trends without using histogram-matching (e.g., [63][64][65]).However, we note that these studies focus on large-scale changes in deep surface water extent/presence, wherein atmospheric noise plays less of a factor.We find that these preprocessing steps are necessary to achieve optimal results where we attempt to identify more subtle changes in water reflectance.The preprocessing steps should be considered in land-use change detection workflows, particularly if top-of-the-atmosphere products are utilized.
In addition to preprocessing methods, decisions regarding the inclusion of specific channels of remotely sensed data into models for analyzing LCLU change dynamics are important to ensure accurate outcomes.Critical tradeoffs between sensor spatial resolution, temporal resolution, and the availability of spectral channels can constrain the scope of landcover change analysis.In the context of ASGM mining pond detection and classification, where patterns across years and seasons are evident, newly established commercially available satellite imagery (PlanetScope, DigitalGlobe) provide the temporal and spatial resolution necessary to detect fine-scale changes; however, these products generally are only available in a narrow set of channels.We found that the supervised E-ReCNN model generated the best outcomes in the 6-Channel and 10-Channel datasets after histogram matching, with significantly lower F1-scores in the 3-Channel dataset.When we applied lifting using L*a*b color space variables, accuracies were either unchanged (for the 10channel dataset) or slightly decreased (for the 3-and 6-channel datasets).Consequently, we conclude that the selection of RGB images for this type of change detection may result in inferior outcomes compared to datasets with a greater number of channels in the infrared and red-edge spectrum.Commercial satellite data that lack these channels may therefore be limited in detecting important changes in aquatic systems, at least in comparison to other options.
Our modeling results showed a notable difference in accuracy between supervised and semi-supervised methods.Although the novel unsupervised learning methods presented in the literature show a great deal of potential for change detection [66][67][68], when we utilized one such unsupervised learning method [69], model performance results were substantially weaker than those provided by E-ReCNN and SVM-STV.Thus, we did not include detailed results regarding using unsupervised learning techniques for this problem.Our semi-supervised method, SVM-STV, in general, fits the data effectively by making use of a small fraction of labels, especially when only RGB data is provided.The results indicate that if a small, labeled set of a mining region in MDD is retrieved, the SVM-STV method can be trained on a desktop computer in a matter of minutes and produce reasonable results for both binary and multi-class classification.In practice, users can decide the number of expert-generated labels to acquire based on their needs, with the caveat that a fully supervised model may be more accurate and precise.In addition, if RGB images are necessary for detecting rapid change at localized scales, lifting using the La*b* color space generates enhanced results compared to data that has not been preprocessed.
Supervised model performance varied across MDD training regions (Appendix A Table A4) with respect to temporal change but was consistent across regions for detecting change/no change.Change detection F1-scores were higher in regions using water cannons relative to regions using earth-moving equipment.For example, Region 4 within La Pampa is characterized by ovular ponds with distinct edges surrounded by bare ground (Figure 10, top).This region has been heavily mined using suction pumps to displace water into mining ponds and small sluices to separate fine sediment from larger stones and pebbles.Comparatively, Region 12 in Huepetuhe (Figure 10, bottom) features the signature of the use of bulldozers and excavators to move sediment for processing; consequently, this region lacks distinct ponds with clear edges as in Region 4. We suspect that the lack of defined edges of water bodies provided an additional challenge for convolutional filters within E-ReCNN, leading to a decrease in the F1-score in Region 12.The results indicate that regions where mechanized mining is more prevalent are modeled with lower values for detecting increases and decreases in pond reflectance.Consequently, monitoring and modeling directional pond change may be more difficult in areas with differing mining typologies.However, outcomes for detecting change/no-change and water existence were excellent for both methods (Appendix A Tables A6 and A7).Whereas model outcomes were generally accurate across MDD regions, with slight differences between areas with different mining types, model results in the out-of-sample international regions were slightly less accurate with respect to multi-class change detection.However, the E-ReCNN out-of-sample results were still within 10% of the focal region results.This method thus retains significant performance in detecting change/no change and water occurrence in ASGM sites in different contexts and on different continents.With respect to pond increases and decreases in turbidity, both supervised and semi-supervised models generated significantly lower recall and precision for international sites compared to the MDD region results.Semi-supervised results using SVM-STV were extremely inaccurate (Figure 8, right), indicating that using this method is not advisable for accurate change detection.Supervised model results were less accurate for these out-of-sample regions, but still detected binary classes of change quite accurately overall.The construction of regional label sets may improve performance for detailed questions regarding pond status, but for general detection of AGM-associated mining ponds, the supervised model appears suitable for inference worldwide.More thorough investigations at known mined sites across the tropics would provide greater detail regarding the variability of model performance in new regions.

Conclusions and Future Work
In this paper, we describe the creation of a unique ASGM Residual Ponds dataset as well as a new supervised method (E-ReCNN) for detecting fine-scale changes in the environment using satellite imagery.We show how this method compares favorably to existing semi-supervised (SVM-STV) methods.We applied different preprocessing operations on three image sets with different quantities of multispectral bands to analyze their influence on the models' results.For Sentinel-2 imagery, using a 6-band image set generated model performance higher than other band combinations, even those that included more spectral information.Pre-processing was essential to model performance, even on well-curated Sentinel-2 data, increasing model F1-scores from roughly 0.71 to 0.88 for 6-band images.For fine-scale change detection, we conclude that these images need noise reduction and calibration, such as histogram matching for E-ReCNN and the addition of La*b* color space to the SVM-STV model.Given this finding, practitioners using other change-detection methods on available satellite imagery, particularly with respect to water detection, may benefit from revisiting their results and investigating whether inaccuracies were due to preprocessing impacts.
Practitioners wishing to use the methods presented in this manuscript should consider the practical and computational demands of both change detection models.We found that classification performance is inverse to the computation demands for the two methods.Since the SVM-STV model can be trained on local machines, it is an efficient solution under the conditions of limited channels and resources.In contrast, because the E-ReCNN model consists of CNN and LSTM subnetworks, the R-ReCNN model requires lengthy training times on GPUs (Appendix A Table A3).However, it is worth noting that the E-ReCNN model, once trained a single time, appears to be capable of extension to out-of-sample regions with minimal loss in performance, and therefore once this process is completed, this method can be applied globally.
Future work may allow for an improvement of the SVM-STV model, particularly since the training size of labeled pixels used in this test was small and likely contained outliers and noisy pixels that could affect the quality of the model.Although histogram matching reduces radiometric differences in bitemporal images, the disparities among training regions can be significant and influential.Instead of randomly selecting training pixels for a generalized model, kernel density estimation may be used as an indicator that gives information about the "commonness" of each pixel [69,70].This allows for the exclusion of outliers by only selecting pixels at high densities, generating more consistent test results.Furthermore, the SVM-STV model may be improved by including an active learning scheme, which takes into account the practical condition that there is a restricted budget for label collection.The diffusion geometry of the data can be used to push the approach even further by reducing the number of labels needed but producing greater performance [69][70][71].
Follow-up work on E-ReCNN may allow for the application of this model to other landscapes and environmental topics.While we investigated bi-temporal imagery sets in this analysis, the performance of E-ReCNN across a multitemporal image set may offer information regarding model transferability for decadal estimates of change across a landscape.Furthermore, testing E-ReCNN for use with other environmental features for the detection of change, such as fields, roads, and vegetation patches, may allow for a broad expansion of this supervised method to help monitor environmental change in other contexts and locations.

Figure 1 .
Figure 1.Mining ponds in La Pampa, Peru, showing a range of activity levels.Deep green ponds indicate the presence of algae and the cessation of mining activity.Chalky clay-colored ponds contain high levels of suspended sediment and are currently actively mixed.Light green ponds, such as the one in the center of the image, are transitioning from active status to inactive status.

Figure 2 .
Figure 2. Selected sixteen region samples are shown with different transparent colors in the Madre de Dios (MDD), Peru area on Google Earth Engine (GEE) seen on 23 July 2021.For more geographical details about each region sample, see Table A1 in Appendix A and Figure A1 in Appendix B.

Figure 3 .
Figure 3.We manually labeled the state of each pond using the Labelbox tool using RGB and SWGB composite images.The composite images (a) RBG and (b) SWGB display a multitude of ponds clearly, and label categories (c) affixed to these images.Using a color index [(green − red)/(red + green)], ponds can be differentiated with respect to the presence of sediment and photosynthetic material and describe (d) active, (e) inactive, and (f) transition ponds.

Figure 4 .
Figure 4.The E-ReCNN model uses two cloud-free Sentinel 2B images obtained from two different times (18 August 2019 and 23 July 2021) for the same region.Following histogram matching and augmentation, we used a convolutional kernel (Conv2d) on 5 × 5-pixel patches across each image to generate a feature array.These feature arrays then served as the input of the first LSTM layer, and the second LSTM layer was formed following a dropout of 0.2.The last two layers were fully connected, and an output layer was applied with sigmoid/softmax functions to recognize the change in the pond's status.

Figure 5 .
Figure 5. Overview of the SVM-STV method for mining change recognition.Bi-temporal images from a region were used as inputs.Preprocessing steps utilized histogram matching and lifting with lab color data for both images.Labeled points using different images were used to train the ν-SVM in the first stage and then used to generate probability maps.In the second stage, spatial information was utilized by denoising the probability tensor.The final classification results were obtained by taking the index of the maximum probability of each pixel to detect change.

Figure 6 .
Figure 6.(left) Average scores of model performance across the 16 MDD regions for both E-ReCNN and SVM-STV.The highest accuracies were generated with the 6-channel set of histogram-matched data for E-ReCNN and with the 10-channel data for SVM-STV.Blue dots for the Kappa Coefficient of SEVM-STV results indicate the outliers that are higher and lower than the variance line edges.(right) Average scores of model performance for out-of-sample test regions in Indonesia, Myanmar, and Venezuela for both E-ReCNN and SVM-STV.The highest accuracies were generated with the 6-channel set of histogram-matched data for E-ReCNN and with the 6-channel histogram-matched data for SVM-STV.For both left and right, blue, orange, and gray boxes represent the distribution of Cohen's Kappa coefficients, Jaccard coefficient, and F1-scores, respectively.

Figure 7 .
Figure 7. Confusion matrices for the E-ReCNN model for 6-Channel set histogram-matched images from the MDD focal regions (left) and out-of-sample prediction regions (right).For both (left) and (right), recall and precision matrices are featured to the right and below the main confusion matrix, respectively.Arrays at the bottom of both (left) and (right) show the F1-score for each class.

Figure 8 .
Figure 8. Confusion matrices for the SVM-STV model for 10-Channel image sets from the MDD focal regions (left) and 6-channel histogram-matched images for the out-of-sample regions (right).For both (left) and (right), recall and precision matrices are featured to the right and below the main confusion matrix, respectively.Arrays at the bottom of both (left) and (right) show the F1-score for each class.

Figure 9 .
Figure 9. Analysis of multi-class results according to different numbers of channels by F1-score.Results of histogram-matched images according to 3-, 6-, and 10-Channel sets are shown based on F1-score.The E-ReCNN model results had higher accuracy than the SVM-STV model.The 6-Channel results average was more accurate and had less standard deviation than the 3-and 10-Channel results.

Figure 10 .
Figure 10.(a) True-color image composites of Region 4 in La Pampa (b) the overlay images of semi-manual label maps and model-predicted results of Region 4, (c) zoom in the ponds of Region 4, (d) True-color image composites of Region 12 in Huepetuhe (e) the overlay images of semi-manual label maps and model-predicted results of Region 12, (f) zoom in the ponds of Region 12, and (g) results table for two regions 4 and 12.These two different regions show how signatures of mining using different practices may generate more or less uniform surface water bodies.In (b) and (e), white and shades of gray represent accurate classification, shades of magenta represent overestimated sediment, shades of green represent underestimated sediment, and black represents no detected change.While Region 4 from La Pampa has deeper and more circular ponds that are separated by sand, Region 12 from Huepetuhe has more shallow, small and intricate ponds mixed with sand and bare ground, which appears to impact accuracy metrics.

Figure A1 .
Figure A1.All images in MDD were used for LoRo experiments.

Figure A2 .
Figure A2.All Test images from Indonesia, Myanmar, and Venezuela.

Figure A3 .
Figure A3.Binary change detection best performance.

Table A3 .
Calculated costs of model training.

Table A4 .
Accuracies of regions and their areas for the E-ReCNN model for histogram-matched 6-Channel images.Region 4 and Region 12 were analyzed in Discussion section and given as an example of the difference kind of ponds.

Table A5 .
The regions in the different parts of the world with their sizes in pixels and km 2 area and latitude and longitude of left bottom and right top.