Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau

Pang, Xin; Li, Hongyi; Ren, Hongrui; Yang, Yaru; Zhao, Qin; Liu, Yiwei; Hao, Xiaohua; Niu, Liting

doi:10.3390/rs17111889

Open AccessArticle

Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau

by

Xin Pang

^1,2,

Hongyi Li

^2,3,*

,

Hongrui Ren

¹,

Yaru Yang

^2,3,

Qin Zhao

^2,3

,

Yiwei Liu

^2,3,

Xiaohua Hao

^2,4

and

Liting Niu

^2,3

¹

College of Geological and Surveying Engineering, Taiyuan University of Technology, Taiyuan 030024, China

²

State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Heihe Remote Sensing Experimental Research Station, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(11), 1889; https://doi.org/10.3390/rs17111889

Submission received: 7 April 2025 / Revised: 16 May 2025 / Accepted: 27 May 2025 / Published: 29 May 2025

(This article belongs to the Special Issue Glacial Lakes and Related Hazards: Mapping, Monitoring, and Risk Assessment)

Download

Browse Figures

Versions Notes

Abstract

Accurate remote sensing identification of river ice not only provides scientific evidence for climate change but also offers early warning information for disasters such as ice jams. Currently, many researchers have used remote sensing index-based methods to identify river ice in alpine regions. However, in high-altitude areas, these index-based methods face limitations in recognizing river ice and distinguishing ice-snow mixtures. With the rapid advancement of machine learning techniques, some scholars have begun to use machine learning methods to extract river ice in northern latitudes. However, there is still a lack of systematic studies on the ability of machine learning to enhance river ice identification in high-altitude, complex terrains. The study evaluates the performance of machine learning methods and the RDRI index method across six aspects: river type, altitude, river width, ice periods, satellite data, and snow cover interference. The results show that machine learning, particularly the RF method, demonstrates superior generalization ability and higher recognition accuracy for river ice in the complex high-altitude terrain of the Tibetan Plateau by leveraging a variety of input data, including spectral and topographical information. The RF model performs best under all types of test conditions, with an average Kappa coefficient of 0.9088, outperforming other machine learning methods and significantly outperforming the traditional exponential method, demonstrating stronger recognition capabilities. Machine learning methods are adaptable to different types of river ice, showing particularly improved recognition of river ice in braided river systems. RF and SVM exhibit more accurate river ice recognition across different altitudinal gradients, with RF and SVM significantly improving the identification accuracy of river ice (0–90 m) on the plateau. RF and SVM methods offer more precise boundary recognition when identifying river ice across different ice periods. Additionally, RF demonstrates better generalization in the transfer of multisource satellite data. RF’s performance is outstanding under different snow cover conditions, overcoming the limitations of traditional methods in identifying river ice under thick snow. Machine learning methods, which are well suited for large sample learning and have strong generalization capabilities, show significant potential for application in river ice identification within high-altitude, complex terrains.

Keywords:

river ice remote sensing; machine learning; Landsat 8; Tibetan Plateau

1. Introduction

River ice is a typical natural phenomenon in alpine regions, usually forming in winter and early spring, and exhibits significant seasonal and regional characteristics [1,2,3]. The formation and melting of river ice play a crucial role in regulating river flow and flood defense [4,5], and also have a significant impact on the ecological environment and socio-economic activities. Therefore, timely monitoring of river ice not only provides early warnings for floods and ice storms, reducing potential disaster risks, but also offers critical data support for studying global warming [6,7].

However, the complex topography and harsh climatic conditions in high-altitude regions such as the Tibetan Plateau pose great challenges to the field observation of river ice. With the continuous development of satellite data and remote sensing technology, remote sensing has become a common tool for large-scale river ice monitoring on the Tibetan Plateau. At present, scholars have mainly utilized the spectral properties of river ice in the visible, near-infrared, and short-wave infrared bands to identify river ice on the Tibetan Plateau [8], and significant progress has been made. However, the identification method based on the spectral properties of river ice still has some limitations in monitoring river ice on the Tibetan Plateau.

For example, Li et al., constructed the NDSI index based on the fact that river ice has a higher reflectance in the visible band and a lower reflectance in the near-infrared band. By using the NDSI index, they conducted long-term monitoring of river ice in the Babao River basin and revealed the distribution characteristics and changing trends of river ice on the basin scale of the Tibetan Plateau [9]. However, the NDSI cannot effectively distinguish between river ice and snow, limiting its application for large-scale river ice identification on the plateau. Li et al., further proposed the Relative Difference River Ice Index (RDRI) based on the reflectance characteristics of river ice in the red, near-infrared, and shortwave infrared bands, effectively distinguishing snow from river ice, and successfully monitored river ice across the Tibetan Plateau [10]. However, the RDRI fails to distinguish river ice effectively under deep snow conditions and tends to miss smaller, thinner ice. Wang et al. constructed an Improved Dual Logistic Regression Model (IDLRM) to analyze the phenological patterns of river ice in the Northern Hemisphere, but the model still relies primarily on set thresholds for river ice identification [11]. Furthermore, Wang et al.’s research focuses mainly on river segments with widths greater than 90 m, leaving a gap in identifying the abundant river ice on the Tibetan Plateau.

With the rapid development of machine learning technology, machine learning algorithms have been widely applied in the remote sensing field, bringing new opportunities for river ice identification. Han et al. combined Landsat-8 OLI data with atmospheric and surface elevation data, using an RF algorithm to identify snow-covered and snow-free river ice in some sections of the Han River. They found that the RF model could identify river ice effectively even under atmospheric pollution [12]. However, the capability of machine learning methods to identify river ice using surface reflectance satellite data in the high-altitude terrain of the Tibetan Plateau remains unclear. Temimi et al., used NOAA-20 and NPP satellite data, combined with the U-Net deep learning algorithm, to monitor river ice in the northern watersheds of the United States and Canada. However, their study focused on large- and medium-scale rivers with widths of 375 m or more [13], leaving the ability of machine learning methods to identify river ice in small rivers on the Tibetan Plateau unexamined. Madaeni used deep neural networks (CNNs) and long short-term memory (LSTM) networks to predict ice jams in parts of the Quebec region [14]. However, due to insufficient training data, the model’s ability to predict river ice formation in early winter requires further exploration.

As mentioned above, although there have been studies using multiple machine learning methods to recognize river ice, there is a lack of systematic research on the ability of these methods to recognize river ice in the complex, high-altitude terrain of the Tibetan Plateau. To fill this gap, this study employs multiple machine learning methods (RF, SVM, and KNN) to identify river ice on the Tibetan Plateau. By evaluating the performance of machine learning methods and traditional index-based methods across six characteristics—river type, altitude, river width, ice periods, satellite data, and snow cover interference—the study explores the enhancement capability of machine learning in high-altitude complex terrains and the underlying reasons.

2. Study Area and Data

2.1. Study Area

The Tibetan Plateau is the world’s highest plateau, with an average elevation exceeding 4000 m. It has a unique alpine climate and abundant ice and snow resources. The plateau has a vast river network, serving as the source of Asia’s major rivers [15,16]. The cold climate of the Tibetan Plateau leads to the formation of river ice in many rivers during winter [17], especially in higher-altitude regions, such as the upper reaches of the Yarlung Tsangpo and Nujiang Rivers. Since rivers often pass through valleys with significant terrain undulations [18], river ice is affected by factors such as altitude, river channel type, water depth, and water quality, and the morphology of river ice formed in different river sections varies significantly.

This study presents the selection of sample points and the distribution of validation regions on the Tibetan Plateau [19] (Figure 1). Considering the diversity of river ice under the complex topography of the plateau, 16 validation areas were selected based on six factors: river type, elevation, river width, ice period, satellite data, and snow cover interference. Twenty images with cloud cover below 30% were filtered to comprehensively represent the characteristics of river ice in high-altitude, complex terrains (Figure 2).

2.2. Satellite Data

The study primarily used Landsat 8, Level 2, Collection 2, and Tier 1 satellite data (hereafter referred to as Landsat 8) as training and validation data. The Landsat 8 data accessed through the Google Earth Engine (GEE) platform already meets geometric and radiometric quality requirements. This dataset has a 16-day revisit period and includes four visible bands, one near-infrared (VNIR) band, and two shortwave infrared (SWIR) bands, all with a spatial resolution of 30 m [20].

The study utilized 623 Landsat 8 images, which were synthesized through false-color composition to generate a river ice image of the Tibetan Plateau that was subsequently used to select sample data. The spectral data and coordinate information of the composite image were included as part of the feature dataset. Additionally, the study used 17 Landsat 8 images, one Landsat 5 image, one Landsat 7 image, and one Sentinel 2 image for visual interpretation and extraction of river ice as validation data.

2.3. Additional Data

Factors such as temperature, precipitation, elevation, and slope have significant correlations with river ice formation. Their combined effects can alter water flow and environmental conditions, impacting the formation and duration of ice. Therefore, the study incorporated climate and topographic data into the feature set for machine learning models. Climate data, such as temperature, were derived from the ERA5-Land Daily Aggregated dataset available on the GEE platform. This high-resolution reanalysis meteorological dataset provides daily global weather data with a spatial resolution of approximately 11 km [21]. Topographic data, including elevation and slope, were obtained from the NASA DEM Digital Elevation dataset on GEE, with a spatial resolution of 30 m [22].

The study also used river network data, glacier inventory data, lake data, and the vector boundary data of the Tibetan Plateau to define the river ice range. River network data played a pivotal role in delineating the river ice extent. The river network data used in this study came from the HydroSHEDS database, which includes approximately 8.5 million rivers worldwide, covering river segments with a catchment area of at least 10 square km or an average river flow of no less than 0.1 cubic m per second [23]. This dataset effectively covers small and medium-sized to large rivers on the Tibetan Plateau.

3. Methodology

3.1. Overall Scheme

The study employed three commonly used supervised machine learning methods available in GEE and designed an overall plan for evaluating the enhancement capability of machine learning in river ice identification on the Tibetan Plateau (Figure 3). First, the study collected and processed river ice extent data and sample data. Second, the machine learning models were trained, and the RDRI index was recalibrated for threshold values. Third, six aspects—river type, altitude, river width, ice period, satellite data, and snow cover interference—were selected to represent the diverse river ice features of the plateau’s complex terrain. Finally, the performance of machine learning methods and the traditional RDRI method in river ice identification was analyzed, assessing the ability of machine learning to enhance identification in high-altitude, complex terrains.

3.2. Machine Learning for River Ice Identification

3.2.1. Delineation of River Ice Extent

To minimize interference from bare soil and other features while improving the efficiency of model training, the study delineated the spatial extent of river ice. First, considering the influence of river course changes and the timeliness of river data, river vector data on the plateau were buffered to a 1000 m range [23]. To avoid interference from glaciers and lake ice within the buffer zone, glacier inventory data and lake data were used to exclude these areas [24,25]. The resulting river extent on the Tibetan Plateau served as the base data for sample selection and model training.

3.2.2. Sample Data and Validation Data

Considering the temporal differences and sensor misalignments between Sentinel-2 and Landsat-8 imagery, which may introduce additional uncertainties, both the sample data and validation data in this study were primarily derived from Landsat-8 imagery. To minimize the impact of cloud and shadow interference while reducing computational load on the GEE platform, sample selection was conducted based on a median composite image generated from Landsat-8 data [26]. Specifically, 623 Landsat-8 scenes with less than 30% cloud cover, acquired between January and April 2022, were used to construct the composite image, which was visualized using standard false-color synthesis to highlight river ice features.

Given the relatively low proportion of river ice among all land-cover types, strictly following the actual area ratio during sample selection could lead to an insufficient number of river ice samples, which would negatively affect classifier training [27]. Despite efforts to reduce interference from non-river ice features, the proportion of river ice remained low compared to other land types. To address this issue and enhance model generalization, the number of river ice samples was increased during visual interpretation. Ultimately, a total of 19,759 samples were visually interpreted from the composite image of the Tibetan Plateau, including 8777 river ice samples and 10,982 samples of other land-cover types, such as water bodies, snow, and bare land.

In addition, to further improve the accuracy of visual interpretation, high-resolution Sentinel-2 imagery from the same or nearby periods was introduced as an auxiliary reference. This supplementary information was particularly helpful in improving the classification of mixed pixels near river ice boundaries, thereby enhancing the accuracy and reliability of both the sample and validation data. Similarly, the validation data were obtained by visually interpreting individual satellite images to delineate river ice extent and generate corresponding ground truth data.

3.2.3. Model and Feature Set Construction

In this study, three representative machine learning algorithms—SVM, KNN, and RF—were selected for river ice classification using remote sensing data [28,29,30]. To ensure that different machine learning methods could perform river ice identification under similar parameter conditions, default parameters were chosen for KNN and SVM in GEE. KNN’s k-value was set to 1, with the search method as AUTO and the distance metric as EUCLIDEAN. The default kernel function for SVM was RBF with the parameter C set to 1. For RF, due to the need to specify the number of trees, the number was set to 100 after multiple experiments, and other parameters were kept as default. Such initial parameter settings allowed for an objective evaluation of each method’s performance in identifying river ice across different validation areas on the plateau.

This study developed a feature dataset comprising 23 variables, categorized into five major groups: spectral, climatic, terrain, texture, and spatial location features. The specific variables and their data sources are detailed in Table 1. Spectral and spatial features were derived from Landsat 8 and Sentinel-2 images. Climatic features were extracted from the ERA5-Land dataset and included two variables relevant to river ice formation: near-surface air temperature at 2 m and precipitation. Terrain features, derived from NASA DEM 30 m resolution data, included elevation, slope, and aspect, aiding in the identification of terrain-related differences between ice and non-ice pixels. Textural features were computed from the near-infrared band (SR_B5), resulting in four texture metrics. Additionally, to prevent overfitting and minimize irrelevant information, the feature set was partitioned into five subsets, each trained separately using machine learning methods. The subset with the highest accuracy was selected for final model training: Subset 1 (spectral features); Subset 2 (spectral and climate features); Subset 3 (spectral, climate, and terrain features); Subset 4 (Subset 3 plus spatial location features); and Subset 5 (all five feature groups).

3.3. River Ice Identification Using the RDRI Index

The difference in reflectance between river ice and snow in the visible and near-infrared bands is greater than that of other objects, which forms the basis of the RDRI method. Additionally, the sum of snow’s reflectance in the NIR and SWIR1 bands is greater than that of river ice, allowing for further distinction between river ice and snow. To overcome interference from water, shadows, and surrounding landscapes, the following conditions were added. The specific formulas are as follows:

R D R I = \frac{(R r e d - R n i r)}{(R n i r + R s w i r)} \geq t h r 1

(1)

R r e d - R n i r \geq 0.068

(2)

R n i r \geq 0.1

(3)

where

R r e d

,

R n i r

, and

R s w i r

represent the red, near-infrared, and shortwave infrared bands (Bands 4, 5, and 6) of Landsat 8, respectively. The RDRI method also adds Formulas (2) and (3) to prevent interference with cloud shadows for water, and thr1 represents the threshold value in Formula (1).

Given variability in brightness and contrast across remote sensing imagery, the original RDRI threshold might be unsuitable for this study area. Thus, a dynamic threshold calibration was implemented using three validation zones (A, B, and C) representing different river types. The method incrementally adjusted the RDRI lower limit by 0.01, calculating the corresponding Kappa coefficient based on an independent random sample set. The lower limit yielding the highest Kappa coefficient was selected as optimal. Finally, the unified threshold for river ice identification in all 20 validation zones was determined by averaging the optimal thresholds from these three zones.

3.4. Accuracy Assessment Metrics

Two validation methods—cross-validation and independent validation—were used to evaluate the accuracy of machine learning models during training and when applied to validation areas. To assess model training accuracy, the sample dataset was randomly divided into ten equal parts: seven for training the model and three as the validation set. A confusion matrix was constructed by comparing the generated river ice image with the validation data to calculate accuracy [31], using the Kappa coefficient as the accuracy metric.

For independent validation, the actual river ice extent derived from visual interpretation served as the ground truth, subsequently rasterized and binarized. A total of 1500 pixels were randomly sampled from the classified river ice imagery and compared against ground truth pixels to construct a confusion matrix and calculate accuracy. The index-based method underwent the same validation process for comparative evaluation.

Accuracy metrics included the Kappa coefficient and overall accuracy (OA). The Kappa coefficient ranges from [−1, 1], with a value of 1 indicating perfect agreement. A Kappa value above 0.80 indicates good consistency, between 0.40 and 0.80 suggests moderate consistency, and below 0.40 indicates poor consistency. A Kappa value of 0 means no agreement, and −1 implies complete disagreement. The formula is as follows:

K a p p a = \frac{p_{o} - p_{e}}{1 - p_{e}}

(4)

where

p_{o}

is the observed agreement, i.e., the proportion of correctly predicted pixels across all classes (the sum of the diagonal elements of the confusion matrix divided by the total number of samples),

p_{e}

is the expected agreement by chance, calculated as the sum of the products of the marginal probabilities (the row and column totals for each class in the confusion matrix) divided by the square of the total number of samples, and N is the total number of sample pixels (used in the confusion matrix).

Overall accuracy (OA) is the proportion of correctly classified pixels out of the total sample pixels. It ranges from 0 to 1, with higher OA indicating better model performance. However, OA does not account for random classification consistency and is used only in Section 4.2.3. The formula is as follows:

O A = \frac{p_{o}}{N}

(5)

where

p_{o}

and N have the same meaning as in Formula (4).

4. Results

4.1. Selection of Optimal Feature Combination

The study conducted cross-validation on the training accuracies of three machine learning methods across different feature subsets (Table 2). Each machine learning method was then paired with its corresponding feature subset that yielded the highest Kappa coefficient. The results show that the RF model performed best, with an average accuracy of 0.9787 and a peak accuracy of 0.9837 on feature subset 4 (spectral, climate, topography, and latitude/longitude). In contrast, KNN and SVM achieved their highest accuracies with subset 2 (spectral and climate features), reaching 0.9755 and 0.8851, respectively. It is noteworthy that the RF model’s accuracy increased from subset 1 to subset 4 but then decreased with subset 5, indicating that texture features contribute little to river ice identification. KNN and SVM performed best with subset 2 and worst with subset 3, suggesting that these distance-based models are significantly influenced by differences in feature scales. The discrepancies in scale between topography and spectral features likely led these models to disregard more valuable features, thus impacting identification accuracy.

4.2. Accuracy Validation and Comparative Analysis of River Ice Extraction Based on Multi-Feature Inputs

4.2.1. Validation of Transfer Accuracy Across Different River Types

The study used three different channel types (straight, curved, and braided) to evaluate the accuracy performance of the different methods in different channel types (Figure 4). The results showed that RF had the highest accuracy (0.8798) followed by SVM (0.8666), KNN (0.8623), and RDRI (0.6625).

RF has higher accuracy in regions A and B and slightly lower in region C at about 0.85. SVM and KNN perform similarly but have lower accuracy in region B at about 0.82. RDRI performs well in region A where the terrain is simple but declines significantly in complex regions.

Overall, the accuracy of machine learning methods in recognizing river types is about 0.2 higher than that of RDRI, especially in areas B and C, where the difference in accuracy is significant.

4.2.2. Validation of Transfer Accuracy Across Different Elevation Gradients

To evaluate performance across elevation gradients (2000–6000 m), river ice images were selected using stratified sampling at 1000 m intervals (Figure 5). SVM achieved the highest average accuracy (0.9267), followed by RF (0.9185), while RDRI (0.7864) and KNN (0.4229) performed less effectively.

RF and SVM maintain high accuracy in all regions, with only a slight decrease in accuracy in the F region. In contrast, KNN failed to identify river ice in areas D and F, and its overall performance was poor. RDRI performs moderately, with accuracy consistently lower than RF and SVM.

Overall, RF and SVM significantly outperformed RDRI and KNN in identifying river ice at different altitudes.

Area error analysis confirmed RF had the lowest average error (11.47%), followed closely by SVM (11.98%), whereas RDRI showed significantly larger errors (>38%) (Figure 6). Both RF and SVM methods considerably reduced area errors (by approximately 27%) compared to the RDRI index across various elevations.

4.2.3. Validation of Transfer Accuracy Across Different River Widths

The study used river ice with widths ranging from 0 to 400m to evaluate the accuracy of the different methods over a 100 m width interval (Figure 7). The results showed that all three machine learning methods outperformed the RDRI index, with SVM having the highest average accuracy (0.9707), followed by RF (0.9682) and KNN (0.9144). RDRI had the lowest average accuracy (0.8986).

SVM and RF showed the most consistent and accurate recognition performance, especially in wider reaches, with a Kappa coefficient close to or above 0.98. In contrast, all methods showed reduced accuracy in narrower river H-regions, with RDRI performing the worst.

Overall, machine learning methods (especially SVM and RF) significantly outperformed RDRI in terms of accuracy in the finer river regions.

The study extracted river ice pixels in regions H and I with widths ranging from 30 to 210 m through visual interpretation (Figure 8). By calculating the overall accuracy (OA) of different methods for these river ice pixels across various river widths, the precision differences in extracting fine river ice at the pixel scale were evaluated (Figure 8). The results showed that the SVM method had the highest average OA at 0.9246, followed by the RF method with an average OA of 0.9065. The KNN method had an average OA of 0.7709, and the RDRI method had the lowest average OA at 0.7441.

All methods showed lower accuracy in narrow rivers (30–90 m) compared to wider ones (120–210 m), with RF and SVM significantly outperforming RDRI. Notably, in the 30–90 m range, SVM and RF improved the OA by about 0.25 over RDRI, highlighting their superiority in identifying fine river ice.

4.2.4. Validation of Transfer Accuracy Across Different Ice Periods

Validation across different ice periods showed that SVM and RF clearly outperformed RDRI and KNN (Figure 9). SVM achieved the highest average accuracy (0.9566), followed by RF (0.9363), RDRI (0.7773), and KNN (0.4454).

SVM maintained consistently high accuracy (Kappa coefficient > 0.9) across all ice periods, whereas RF showed relatively lower accuracy during the L1 period. RDRI exhibited significant omission errors during the L1 and L3 periods. KNN completely failed in the L2 period and notably misclassified snow during the L1 period.

Overall, RF and SVM provided substantially improved accuracy compared to RDRI, particularly in the L3 period, due to better handling of mixed pixels near river ice edges. KNN demonstrated poor generalization ability.

4.2.5. Validation of Transfer Accuracy Across Different Satellite Data

The validation results using Landsat 5, Landsat 7 and Sentinel 2 images showed that RF had the best accuracy (0.9214), followed by SVM (0.8667), RDRI (0.6935), and KNN (0.5051) (Figure 10).

RF achieves high accuracy on all datasets (M1 0.9719, M2 0.8421, M3 0.9501). SVM performs slightly lower with RF. RDRI performs well in M1 and M3, but performs poorly in M2, which may be affected by the striping issues of the images. KNN’s ability to discriminate among the different satellite data once again demonstrates its poor generalization ability.

Overall, the transferability of RF and SVM in different satellite data is significantly better than that of RDRI, especially in M2 where the accuracy is excellent.

4.2.6. Validation of Transfer Accuracy Across Different Snow Cover Conditions

RF performed most consistently and accurately across snow cover conditions, with an average Kappa coefficient of 0.8054 (Figure 11). In contrast, SVM, KNN, and RDRI had significantly lower average accuracies, all below 0.6.

SVM performs well in region N (0.8964) and P (0.7528), but accuracy drops sharply in region O (0.0009). KNN had the lowest overall accuracy (0.1812), failing to identify river ice under the snow and often misidentifying snow on land. RDRI performs relatively well in region N (0.8458) but is not as accurate in region O and P regions, showing great miss-scoring.

Overall, RF outperforms the other methods in all snow conditions, including in the most challenging deep snow scenario. In contrast, SVM is sensitive to snow depth, RDRI struggles with snow-covered ice, and KNN lacks immunity to interference in all snow-covered conditions. These results highlight the superior ability of RF to recognize river ice in snow-affected environments.

In region P, where river ice was fully covered by snow, both RF and RDRI mainly showed omission errors without significant misclassification (Figure 12). To compare their performance, spectral values in the red, near-infrared (NIR), and shortwave infrared (SWIR) bands were extracted and analyzed.

The results showed that RF identified a broader range of river ice pixels, especially in areas with low NIR but high SWIR reflectance, capturing spectral differences between river ice and snow more effectively. In contrast, RDRI failed to distinguish these subtle variations. Overall, RF demonstrated a greater capacity to handle complex spectral conditions and accurately identify snow-covered river ice.

4.3. Evaluation of Generalization Ability for River Ice Identification

Figure 13 presents the Kappa coefficient distribution of four methods (KNN, SVM, RF, and RDRI) across 20 validation regions on the Tibetan Plateau, reflecting their accuracy and generalization capabilities. Among them, RF achieved the highest average Kappa coefficient (0.9088) with the lowest standard deviation (0.0636) and no outliers, indicating consistently high accuracy and excellent generalization across diverse conditions.

SVM showed a slightly higher median than RF but with greater variability (std = 0.2133), suggesting that although it performed well in most regions, its accuracy dropped significantly in areas affected by deep snow. RDRI obtained a moderate average Kappa coefficient (0.7188) and standard deviation (0.2400) but performed poorly in regions with fine or mixed river ice. KNN exhibited the lowest performance, with an average Kappa coefficient of only 0.5486 and the highest variability (std = 0.4038), reflecting instability and poor adaptability across validation regions.

In conclusion, RF showed the strongest generalization ability for river ice identification on the complex terrain of the Tibetan Plateau, followed by SVM. Compared to the traditional RDRI index, both RF and SVM significantly improved accuracy and stability. KNN, due to its simple classification mechanism, was less effective under varying conditions.

5. Discussion

5.1. Comparison Between River Ice Identification Methods

In river ice identification on the Tibetan Plateau, the RF method outperforms the RDRI index in both accuracy and generalization ability. While Li et al., effectively used the RDRI method to distinguish snow from river ice, it struggled in deep snow and snow-covered ice regions [10]. RF, however, demonstrated high accuracy in distinguishing deep snow from river ice. This study also provided a visual evaluation of the machine learning methods’ ability to identify river ice widths of 30–210 m, filling the gap left by Temimi et al., who did not evaluate machine learning methods’ ability to identify river ice [13]. Additionally, this study independently validated machine learning methods’ performance across six characteristics of river ice (river type, width, etc.), offering a comprehensive evaluation. Compared to Han et al., who used RF and Landsat-8 OLI data to identify river ice in parts of the Han River [12], this study offers a broader evaluation of the performance of machine learning methods for different river ice characteristics on the Tibetan Plateau, providing a foundational basis for further application of machine learning in river ice identification.

Recent advances in deep learning have led to the successful application of models such as U-Net and other CNN-based architectures to river ice segmentation tasks. For example, Zhang et al. proposed ICENET, a semantic segmentation model that combines spatial and channel attention to effectively improve the accuracy of river ice recognition under UAV remote sensing images [32]. Similarly, Singh et al. demonstrated that CNN-based models outperform thresholding methods in capturing the spatial complexity of river ice under different surface and illumination conditions [33]. Despite these promising developments, such deep learning models usually require a large amount of labeled training data and extensive computational resources, which are difficult to obtain in high-altitude regions such as the Tibetan Plateau. In contrast, Random Forest, a traditional machine learning approach used in this study, strikes a practical balance between accuracy and efficiency. It is well supported by the Google Earth Engine platform and is suitable for large-scale applications in environments like the Tibetan Plateau, where data are scarce and the terrain is complex.

5.2. Impact of Machine Learning Method Differences on River Ice Identification Enhancement

The study evaluated the enhancement capability of machine learning methods for river ice identification on the Tibetan Plateau. The results indicated significant differences among the enhancement effects of RF, SVM, and KNN on river ice identification, primarily stemming from fundamental distinctions between the methods.

Although KNN and SVM belong to the broader category of machine learning methods due to their classification and prediction capabilities, their underlying statistical principles and implementation differ from the modern definition of machine learning. KNN lacks a clear model training process and parameter adjustment, making it inconsistent with modern machine learning standards. Furthermore, as KNN’s predictive ability is based on distance metrics, its generalization ability depends on the quantity and distribution of training data [34,35]. Although this study considered diverse river ice samples, the number and distribution of different characteristics remained insufficient, potentially contributing to KNN’s poor generalization ability in some validation areas. The SVM method primarily classifies river ice by finding a hyperplane, which does not undergo further adjustment once determined. Additionally, SVM’s computational complexity increases rapidly with larger training datasets, making it unsuitable for large sample learning. Therefore, SVM showed poor river ice identification capability in some areas of the Tibetan Plateau when faced with the extensive data used in this study. In contrast, the RF algorithm, characterized by automatic feature learning from data, handling nonlinear relationships, and suitability for large sample learning, aligns well with modern machine learning definitions. The RF’s ensemble learning feature enhances its robustness in feature selection [30], resulting in superior identification accuracy across different feature subsets.

The study which utilized the built-in functions of the GEE platform assessed the feature importance of the RF model (Figure 14). The results indicate that RTI, NDVI, topographic features, and spatial location contribute most significantly, highlighting their dominant role in river ice classification. These features enable the RF model not only to capture spectral information of river ice but also to effectively extract topographic and spatial patterns. In contrast, traditional indices such as RDRI cannot leverage these types of information, which further explains the superior performance of the RF model in the complex environmental conditions of the Tibetan Plateau.

In summary, fundamental differences between the three machine learning methods lead to significant variations in their river ice identification enhancement capabilities on the Tibetan Plateau. The RF method demonstrated the strongest enhancement capability, while SVM and KNN had poor enhancement effects in some validation areas. Hence, selecting the appropriate machine learning method is crucial for improving river ice identification in high-altitude, complex terrains.

5.3. Uncertainty Analysis

Uncertainty in this study primarily stems from three sources. First, the representativeness of river ice diversity on the Tibetan Plateau based on six characteristics contains uncertainty. Although the study utilized 20 validation images representing different river ice features (e.g., river type, width, elevation), it is challenging to fully capture all river ice types across the Tibetan Plateau. For instance, while Li et al., used nine images to validate different river ice periods using the RDRI method, this study only used three, potentially introducing uncertainty into the machine learning validation results.

Second, there is uncertainty in the accuracy evaluation of river ice identification between machine learning methods and the RDRI method. This study primarily used the Kappa coefficient as the accuracy metric for river ice identification but did not employ additional metrics for comprehensive evaluation. While the Kappa coefficient effectively avoids misleading results from imbalanced class distribution, relying solely on it may not fully reflect the specific identification results of river ice and non-river ice, potentially leading to biased accuracy conclusions.

Third, uncertainty exists in training machine learning models. Except for the number of trees in the RF classifier, the default settings were used for all other machine learning classifiers, lacking parameter optimization. This parameter configuration may introduce uncertainty in the cross-validation of machine learning models’ training accuracy. Moreover, KNN and SVM models are sensitive to feature scales, but this study did not normalize feature data, potentially introducing uncertainty in selecting the optimal feature set.

5.4. Future Prospects

Although the study considered five categories of factors—topography, spectral, weather, spatial location, and texture—that influence river ice identification to construct the feature set for machine learning models, the contribution of each feature to river ice identification was not evaluated due to limitations of the GEE platform [36]. In future research, enhancing feature importance analysis could help eliminate redundant feature data [37], thereby improving the accuracy of river ice identification by machine learning models [38]. Moreover, by analyzing the importance of feature data, it could reveal which features play a critical role in river ice identification, thus enhancing the interpretability of the machine learning model [39].

While this study can further supplement the validation images for river ice across the six characteristics, we believe the results offer valuable references for river ice identification on the Tibetan Plateau, particularly in recognizing 0–90m river ice and under conditions of deep snow. As more sample data are collected and additional validation images are incorporated, future research will further verify and analyze our conclusions on the enhancement capability of machine learning in river ice identification on the Tibetan Plateau.

Lastly, additional remote sensing data, such as atmospheric auxiliary data and SAR data, can be incorporated into the study [40]. By integrating multiple data sources and advanced methods like deep learning and transfer learning, there is potential to further improve the ability of machine learning to identify river ice on the Tibetan Plateau under challenging weather conditions (e.g., cloud cover) and even provide information on river ice thickness and sediment content.

6. Conclusions

Based on three machine learning methods, RF, SVM, and KNN, this study comprehensively evaluated the enhanced ability of machine learning in identifying river ice in high-altitude and complex terrain on the Tibetan Plateau from six aspects: river type, altitude, width, ice age, satellite data, and snow interference, and explained the improvement in machine learning methods in river ice identification compared with traditional index-based methods. The results show that:

(1): Aiming at the shortcomings of traditional methods, the study further introduces three machine learning methods, namely, SVM, KNN, and RF, to construct a river ice identification model integrating multi-source features, which significantly improves the accuracy and stability of the river ice extraction under the complex river ice features on the Tibetan Plateau.
(2): The RF model performs best under all test conditions, with an average Kappa coefficient of 0.9088, outperforming other machine learning methods and significantly surpassing the traditional index-based method.
(3): The RF method has higher recognition accuracy in curved river segments, bifurcated river channels, fine river ice, and deep snow-disturbed areas, especially under the conditions of fine river width (0–90 m) and different image sources (e.g., Landsat 7), the RF method’s extraction performs stably and has strong generalization ability.
(4): Overall, machine learning methods, particularly the RF model, effectively extract information from multi-dimensional features through ensemble learning, considering weather, topography, and spectral factors. This approach overcomes the limitations of traditional methods that overly rely on spectral values and thresholds, significantly improving the accuracy and generalization ability of river ice identification in high-altitude, complex terrains.

Author Contributions

Conceptualization, X.P. and H.L.; methodology, X.P. and H.L.; software, X.P.; validation, X.P. and H.L.; formal analysis, X.P., H.L. and H.R.; investigation, X.P., H.L., Y.Y. and Q.Z.; resources, X.P., H.L., Y.Y., Q.Z., Y.L. and X.H.; data curation, X.P. and H.L.; writing—original draft preparation, X.P.; writing—review and editing, X.P. and H.L.; visualization, X.P. and H.L.; supervision, Y.Y. and L.N.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Numbers: U22A20564) and the Program of the State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, CAS (Grant Numbers: CSFSE-ZZ-2409).

Data Availability Statement

The elevation data, vector data, river data, and satellite imagery of the Tibetan Plateau used in this study are all publicly available. The sample data used in this study are available upon request.

Acknowledgments

The authors appreciate all the data provided by each open database. The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions on this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Beltaos, S.; Burrell, B. Hydrotechnical Advances in Canadian River Ice Science and Engineering during the Past 35 Years. Can. J. Civ. Eng. 2015, 42, 583–591. [Google Scholar] [CrossRef]
Magnuson, J.J.; Robertson, D.M.; Benson, B.J.; Wynne, R.H.; Livingstone, D.M.; Arai, T.; Assel, R.A.; Barry, R.G.; Card, V.; Kuusisto, E.; et al. Historical Trends in Lake and River Ice Cover in the Northern Hemisphere. Science 2000, 289, 1743–1746. [Google Scholar] [CrossRef]
Tedesco, M. Remote Sensing of the Cryosphere; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015; pp. 1–16. ISBN 978-1-118-36890-9. [Google Scholar]
Yang, X.; Pavelsky, T.M.; Allen, G.H. The Past and Future of Global River Ice. Nature 2020, 577, 69–73. [Google Scholar] [CrossRef]
Kang, S.; Guo, W.; Wu, T.; Zhong, X.; Chen, X.; Min, X.; Jinlei, C.; Ruimin, Y. Cryospheric Changes and Their Impacts on Water Resources in the Belt and Road Regions. Adv. Earth Sci. 2020, 35, 1–17. [Google Scholar]
Burrell, B.C.; Beltaos, S.; Turcotte, B. Effects of Climate Change on River-Ice Processes and Ice Jams. Int. J. River Basin Manag. 2023, 21, 421–441. [Google Scholar] [CrossRef]
Lindenschmidt, K.-E. River Ice Processes and Ice Flood Forecasting; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Cooley, S.W.; Pavelsky, T.M. Spatial and Temporal Patterns in Arctic River Ice Breakup Revealed by Automated Ice Detection from MODIS Imagery. Remote Sens. Environ. 2016, 175, 310–322. [Google Scholar] [CrossRef]
Li, H.; Li, H.; Wang, J.; Hao, X. Monitoring High-Altitude River Ice Distribution at the Basin Scale in the Northeastern Tibetan Plateau from a Landsat Time-Series Spanning 1999–2018. Remote Sens. Environ. 2020, 247, 111915. [Google Scholar] [CrossRef]
Li, H.; Li, H.; Wang, J.; Hao, X. Identifying River Ice on the Tibetan Plateau Based on the Relative Difference in Spectral Bands. J. Hydrol. 2021, 601, 126613. [Google Scholar] [CrossRef]
Wang, X.; Feng, L. Patterns and Trends in Northern Hemisphere River Ice Phenology from 2000 to 2021. Remote Sens. Environ. 2024, 313, 114346. [Google Scholar] [CrossRef]
Han, H.; Kim, T.; Kim, S. River Ice Mapping from Landsat-8 OLI Top of Atmosphere Reflectance Data by Addressing Atmospheric Influences with Random Forest: A Case Study on the Han River in South Korea. Remote Sens. 2024, 16, 3187. [Google Scholar] [CrossRef]
Temimi, M.; Abdelkader, M.; Tounsi, A.; Chaouch, N.; Carter, S.; Sjoberg, B.; Macneil, A.; Bingham-Maas, N. An Automated System to Monitor River Ice Conditions Using Visible Infrared Imaging Radiometer Suite Imagery. Remote Sens. 2023, 15, 4896. [Google Scholar] [CrossRef]
Madaeni, F.; Chokmani, K.; Lhissou, R.; Gauthier, Y.; Tolszczuk-Leclerc, S. Convolutional Neural Network and Long Short-Term Memory Models for Ice-Jam Predictions. Cryosphere 2022, 16, 1447–1468. [Google Scholar] [CrossRef]
Li, Z.; Yu, G.; Xu, M.; Hu, X.; Yang, H.; Hu, S. Progress in Studies on River Morphodynamics in Qinghai-Tibet Plateau. Adv. Water Sci. 2016, 27, 617–628. [Google Scholar]
Su, F.; Zhang, L.; Ou, T.; Chen, D.; Yao, T.; Tong, K.; Qi, Y. Hydrological Response to Future Climate Changes for the Major Upstream River Basins in the Tibetan Plateau. Glob. Planet. Chang. 2016, 136, 82–95. [Google Scholar] [CrossRef]
Qin, X.; Sun, J.; Chen, T. Study on Spatiotemporal Variation of Temperature and Precipitation in Qinghai-Tibetan Plateau from 1974 to 2013. J. Chengdu Univ. (Nat. Sci. Ed.) 2015, 34, 191–195. [Google Scholar]
Immerzeel, W.W.; Van Beek, L.P.H.; Bierkens, M.F.P. Climate Change Will Affect the Asian Water Towers. Science 2010, 328, 1382–1385. [Google Scholar] [CrossRef]
Zhang, Y.; Li, B.; Liu, L.; Zheng, D. Redetermine the region and boundaries of Tibetan Plateau. Geogr. Res. 2021, 40, 1543–1553. [Google Scholar] [CrossRef]
Wulder, M.A.; Roy, D.P.; Radeloff, V.C.; Loveland, T.R.; Anderson, M.C.; Johnson, D.M.; Healey, S.; Zhu, Z.; Scambos, T.A.; Pahlevan, N.; et al. Fifty Years of Landsat Science and Impacts. Remote Sens. Environ. 2022, 280, 113195. [Google Scholar] [CrossRef]
Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A State-of-the-Art Global Reanalysis Dataset for Land Applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
Buckley, S. Nasadem Merged Dem Global 1 Arc Second V001 [Data Set]; NASA EOSDIS Land Processes DAAC; USGS: Reston, VA, USA, 2020. [Google Scholar]
Lehner, B.; Grill, G. Global River Hydrography and Network Routing: Baseline Data and New Approaches to Study the World’s Large River Systems. Hydrol. Process. 2013, 27, 2171–2186. [Google Scholar] [CrossRef]
Ye, Q.; Zong, J.; Tian, L.; Cogley, J.G.; Song, C.; Guo, W. Glacier Changes on the Tibetan Plateau Derived from Landsat Imagery: Mid-1970s–2000–13. J. Glaciol. 2017, 63, 273–287. [Google Scholar] [CrossRef]
Zhang, G.; Yao, T.; Chen, W.; Zheng, G.; Shum, C.K.; Yang, K.; Piao, S.; Sheng, Y.; Yi, S.; Li, J.; et al. Regional Differences of Lake Evolution across China during 1960s–2015 and Its Natural and Anthropogenic Causes. Remote Sens. Environ. 2019, 221, 386–404. [Google Scholar] [CrossRef]
Phan, T.N.; Kuch, V.; Lehnert, L.W. Land Cover Classification Using Google Earth Engine and Random Forest Classifier—The Role of Image Composition. Remote Sens. 2020, 12, 2411. [Google Scholar] [CrossRef]
Ali, A.; Shamsuddin, S.M.; Ralescu, A.L. Classification with Class Imbalance Problem. Int. J. Advance Soft Compu. Appl. 2013, 5, 176–204. [Google Scholar]
Mountrakis, G.; Im, J.; Ogole, C. Support Vector Machines in Remote Sensing: A Review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Laaksonen, J.; Oja, E. Classification with Learning K-Nearest Neighbors. In Proceedings of the International Conference on Neural Networks (ICNN’96), Washington, DC, USA, 3–6 June 1996; Volume 3, pp. 1480–1483. [Google Scholar]
Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Stehman, S.V. Selecting and Interpreting Measures of Thematic Classification Accuracy. Remote Sens. Environ. 1997, 62, 77–89. [Google Scholar] [CrossRef]
Zhang, X.; Jin, J.; Lan, Z.; Li, C.; Fan, M.; Wang, Y.; Yu, X.; Zhang, Y. ICENET: A Semantic Segmentation Deep Network for River Ice by Fusing Positional and Channel-Wise Attentive Features. Remote Sens. 2020, 12, 221. [Google Scholar] [CrossRef]
Singh, A.; Kalke, H.; Loewen, M.; Ray, N. River Ice Segmentation with Deep Learning. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7570–7579. [Google Scholar] [CrossRef]
Vapnik, V.N. An Overview of Statistical Learning Theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [PubMed]
Qian, Y.; Zhou, W.; Yan, J.; Li, W.; Han, L. Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery. Remote Sens. 2014, 7, 153–168. [Google Scholar] [CrossRef]
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Kalogirou, S.; Wolff, E. Less Is More: Optimizing Classification Performance through Feature Selection in a Very-High-Resolution Remote Sensing Object-Based Urban Application. GIScience Remote Sens. 2018, 55, 221–242. [Google Scholar] [CrossRef]
Schratz, P.; Muenchow, J.; Iturritxa, E.; Cortés, J.; Bischl, B.; Brenning, A. Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques? Remote Sens. 2021, 13, 4832. [Google Scholar] [CrossRef]
Guo, J.; Zhou, X.; Li, J.; Plaza, A.; Prasad, S. Superpixel-Based Active Learning and Online Feature Importance Learning for Hyperspectral Image Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 347–359. [Google Scholar] [CrossRef]
Zhang, X.; Yue, Y.; Han, L.; Li, F.; Yuan, X.; Fan, M.; Zhang, Y. River Ice Monitoring and Change Detection with Multi-Spectral and SAR Images: Application over Yellow River. Multimed. Tools Appl. 2021, 80, 28989–29004. [Google Scholar] [CrossRef]

Figure 1. Distribution of the study area and validation zones in the Tibetan Plateau. (a) Shows the location of the Tibetan Plateau on the global map. (b) Shows the distribution of sampling sites on the Tibetan Plateau in the study. (c) Shows the topography of the Tibetan Plateau, along with the distribution of major rivers and validation regions. In (c), the black dots represent the locations of the 20 validation regions, the red letters indicate the labels of each validation region, and the blue lines represent the major rivers on the Tibetan Plateau. The location, elevation, and temperature of the 20 validation areas were determined by the coordinates of the center of each area.

Figure 2. Tibetan Plateau validation zones satellite images. The red letters in the figure are the labels of the validation points. The black text indicates the river ice characteristics of the validation points, and the text at the top represents the river ice with six different characteristics. Except for those with different satellite data, all other satellite images are from Landsat 8 data. The red numbers in the satellite images represent the image labels and indicate the image dates.

Figure 3. Overall plan and key technical workflow.

Figure 4. River ice identification results for machine learning methods and RDRI index across different river types. The validation areas A, B, and C represent river ice in three types of river channels: straight, curved, and braided. Truth represents the actual condition of the river ice in the image, the blue part indicates the visual interpretation of the river ice, and the red numbers indicate the date of image acquisition. The blue part of the image on the right side of Truth is the result of the different methods of river ice identification, and the black numbers in the subfigure indicate the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different river types.

Figure 5. River ice identification results for machine learning methods and RDRI index across different elevations. The validation areas D, E, F, and G represent river ice at different elevation gradients, with E indicating elevation. Truth represents the actual condition of the river ice in the image, the blue part represents the result of visual interpretation of the river ice, and the red number represents the date of acquisition of the image. Blue and red parts of the image on the right side of Truth are the results of different methods of river ice identification, and the black numbers in the subfigure represent the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different elevations.

Figure 6. The horizontal axis represents the percentage of area error, and the vertical axis represents the elevation. The percentage of area error is calculated by dividing the difference between the river ice area identified by each method and the river ice area obtained from visual interpretation by the area from visual interpretation.

Figure 7. River ice identification results across different river widths for machine learning methods and RDRI Index. The validation areas H, I, J, and K represent river ice at different river widths. W denotes river width, and its specific value is determined based on the maximum number of pixels of the same width within the validation image. Truth represents the actual condition of the river ice in the image, the blue part indicates the visual interpretation of the river ice, and the red numbers indicate the date of image acquisition. The blue part of the image on the right side of Truth is the result of the different methods of river ice identification, and the black numbers in the subfigure indicate the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different river widths.

Figure 8. Overall accuracy of river ice identification across river widths of 0–210 m for machine learning methods and RDRI Index.

Figure 9. River ice identification results for machine learning methods and RDRI index across different ice periods. The validation areas L1–L3 represent river ice during different ice periods: formation, peak, and melting. Truth represents the actual condition of the river ice in the image, the blue part indicates the visual interpretation of the river ice, and the red numbers indicate the date of image acquisition. The blue part of the image on the right side of Truth is the result of the different methods of river ice identification, and the black numbers in the subfigure indicate the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different ice periods.

Figure 10. River ice identification results across different satellite data for machine learning methods and the RDRI index. The validation areas M1–M3 represent river ice identified using different satellite data. Truth represents the actual condition of the river ice in the image, the blue part represents the result of visual interpretation of the river ice, and the red number represents the date of acquisition of the image. Blue and red parts of the image on the right side of Truth are the results of different methods of river ice identification, and the black numbers in the subfigure represent the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different satellite data.

Figure 11. River ice identification results of machine learning methods and the RDRI index under different snow cover conditions. The validation areas N, O, and P represent river ice under different snow cover conditions: thin snow (N), deep snow (O), and snow-covered ice (P). Truth represents the actual condition of the river ice in the image, the blue part indicates the visual interpretation of the river ice, and the red numbers indicate the date of image acquisition. The blue part of the image on the right side of Truth is the result of the different methods of river ice identification, and the black numbers in the subfigure indicate the Kappa coefficient. Mean represents the average Kappa coefficient for each method across different snow cover conditions.

Figure 12. Projections of river ice identification for RF and RDRI in snow-covered ice area. Red scatter points represent river ice data identified by the RDRI method, with each point indicating the spectral values at a specific coordinate. Green scatter points represent river ice data identified by machine learning methods, such as RF. The X, Y, and Z axes represent the red (Red), near-infrared (NIR), and shortwave infrared (SWIR) bands, respectively. These bands were selected for their importance in river ice identification.

Figure 13. Kappa coefficient distribution of machine learning methods and the RDRI index in all validation areas. The orange, red, green, and blue boxes represent the 50% distribution range of the Kappa coefficient for KNN, SVM, RF, and RDRI in the 20 validation regions, respectively. The horizontal lines inside the boxes represent the median Kappa coefficient, and the upper and lower edges of the boxes represent the maximum and minimum Kappa coefficient values. The circles represent extreme outliers for each method. The black lines are the error bars for each method, with the mean indicating the average Kappa coefficient in the validation regions and std representing the standard deviation for each method.

Figure 14. Feature importance in river ice classification (RF). The figure presents the relative importance values of each input variable, with higher scores indicating a greater contribution to the model’s classification performance. Variables are sorted in increasing order of importance.

Table 1. Feature set.

Feature	Explanation	Category
Precipitation	Total precipitation in m	Climate
Temperature	Average temperature in Celsius	Climate
Slope	Slope of the terrain, indicating the steepness of the surface	Terrain
Elevation	Altitude in m
Aspect	Aspect of the terrain, indicating the orientation of the slope
NDWI	Normalized Difference Water Index (SR_B3 − SR_B5)/(SR_B3 + SR_B5)	Spectral
NDSI	Normalized Difference Snow Index (SR_B3 − SR_B6)/(SR_B3 + SR_B6)
NDVI	Normalized Difference Vegetation Index (SR_B5 − SR_B4)/(SR_B5 + SR_B4)
NDBI	Normalized Difference Built-up Index (SR_B6 − SR_B5)/(SR_B6 + SR_B5)
RDRI	Relative Difference River Ice Index (SR_B4 − SR_B5)/(SR_B5 + SR_B6)
RTI	Reflectance Threshold Index (SR_B4 − SR_B5)
SR_B5	Reflectance of Band 5 (Near Infrared) from Landsat 8 satellite
SR_B4	Reflectance of Band 4 (Red) from Landsat 8 satellite
SR_B3	Reflectance of Band 3 (Green) from Landsat 8 satellite
SR_B2	Reflectance of Band 2 (Blue) from Landsat 8 satellite
SR_B7	Reflectance of Band 7 (Shortwave Infrared 2) from Landsat 8 satellite
SR_B6	Reflectance of Band 6 (Shortwave Infrared 1) from Landsat 8 satellite
SR_B5_contrast	Texture feature of Band 5: Contrast	Texture
SR_B5_corr	Texture feature of Band 5: Correlation
SR_B5_var	Texture feature of Band 5: Variance
SR_B5_ent	Texture feature of Band 5: Entropy
LON	Longitude of the pixel	Spatial Position
LAT	Latitude of the pixel	Spatial Position

Table 2. Accuracy of machine learning methods under different classification strategies.

Feature Set	RF	KNN	SVM	Feature Combination
Feature Subset 1	0.9701	0.9655	0.8828	Spectral
Feature Subset 2	0.9748	0.9755	0.8851	Spectral + Climate
Feature Subset 3	0.9813	0.7739	0.8016	Spectral + Climate + Terrain
Feature Subset 4	0.9837	0.7792	0.8259	Spectral + Climate + Terrain + Longitude/Latitude
Feature Subset 5	0.9834	0.7864	0.8258	Spectral + Climate + Terrain + Texture + Longitude/Latitude
Average	0.9787	0.8561	0.8422	\

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pang, X.; Li, H.; Ren, H.; Yang, Y.; Zhao, Q.; Liu, Y.; Hao, X.; Niu, L. Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau. Remote Sens. 2025, 17, 1889. https://doi.org/10.3390/rs17111889

AMA Style

Pang X, Li H, Ren H, Yang Y, Zhao Q, Liu Y, Hao X, Niu L. Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau. Remote Sensing. 2025; 17(11):1889. https://doi.org/10.3390/rs17111889

Chicago/Turabian Style

Pang, Xin, Hongyi Li, Hongrui Ren, Yaru Yang, Qin Zhao, Yiwei Liu, Xiaohua Hao, and Liting Niu. 2025. "Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau" Remote Sensing 17, no. 11: 1889. https://doi.org/10.3390/rs17111889

APA Style

Pang, X., Li, H., Ren, H., Yang, Y., Zhao, Q., Liu, Y., Hao, X., & Niu, L. (2025). Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau. Remote Sensing, 17(11), 1889. https://doi.org/10.3390/rs17111889

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Enhanced River Ice Identification in the Complex Tibetan Plateau

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Satellite Data

2.3. Additional Data

3. Methodology

3.1. Overall Scheme

3.2. Machine Learning for River Ice Identification

3.2.1. Delineation of River Ice Extent

3.2.2. Sample Data and Validation Data

3.2.3. Model and Feature Set Construction

3.3. River Ice Identification Using the RDRI Index

3.4. Accuracy Assessment Metrics

4. Results

4.1. Selection of Optimal Feature Combination

4.2. Accuracy Validation and Comparative Analysis of River Ice Extraction Based on Multi-Feature Inputs

4.2.1. Validation of Transfer Accuracy Across Different River Types

4.2.2. Validation of Transfer Accuracy Across Different Elevation Gradients

4.2.3. Validation of Transfer Accuracy Across Different River Widths

4.2.4. Validation of Transfer Accuracy Across Different Ice Periods

4.2.5. Validation of Transfer Accuracy Across Different Satellite Data

4.2.6. Validation of Transfer Accuracy Across Different Snow Cover Conditions

4.3. Evaluation of Generalization Ability for River Ice Identification

5. Discussion

5.1. Comparison Between River Ice Identification Methods

5.2. Impact of Machine Learning Method Differences on River Ice Identification Enhancement

5.3. Uncertainty Analysis

5.4. Future Prospects

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI