Lake Phenology of Freeze-Thaw Cycles Using Random Forest: A Case Study of Qinghai Lake

: Lake phenology is essential for understanding the lake freeze-thaw cycle e ﬀ ects on terrestrial hydrological processes. The Qinghai-Tibetan Plateau (QTP) has the most extensive ice reserve outside of the Arctic and Antarctic poles and is a sensitive indicator of global climate changes. Qinghai Lake, the largest lake in the QTP, plays a critical role in climate change. The freeze-thaw cycles of lakes were studied using daily Moderate Resolution Imaging Spectroradiometer (MODIS) data ranging from 2000–2018 in the Google Earth Engine (GEE) platform. Surface water / ice area, coverage, critical dates, surface water, and ice cover duration were extracted. Random forest (RF) was applied with a classiﬁer accuracy of 0.9965 and a validation accuracy of 0.8072. Compared with six common water indexes (tasseled cap wetness (TCW), normalized di ﬀ erence water index (NDWI), modiﬁed normalized di ﬀ erence water index (MNDWI), automated water extraction index (AWEI), water index 2015 (WI2015) and multiband water index (MBWI)) and ice threshold value methods, the critical freeze-up start (FUS), freeze-up end (FUE), break-up start (BUS), and break-up end (BUE) dates were extracted by RF and validated by visual interpretation. The results showed an R 2 of 0.99, RMSE of 3.81 days, FUS and BUS overestimations of 2.50 days, and FUE and BUE underestimations of 0.85 days. RF performed well for lake freeze-thaw cycles. From 2000 to 2018, the FUS and FUE dates were delayed by 11.21 and 8.21 days, respectively, and the BUS and BUE dates were 8.59 and 1.26 days early, respectively. Two novel key indicators, namely date of the ﬁrst negative land surface temperature (DFNLST) and date of the ﬁrst positive land surface temperature (DFPLST), were proposed to comprehensively delineate lake phenology: DFNLST was approximately 37 days before FUS, and DFPLST was approximately 20 days before BUS, revealing that the ﬁrst negative and ﬁrst positive land surface temperatures occur increasingly earlier.

the usage of machine learning was still in the exploratory stage. The visual interpretation method is s useful validation method in the absence of in-situ observation [14,[33][34][35][36]. Fortunately, the Google Earth Engine (GEE) platform provides us with an opportunity to simultaneously extract the daily surface water area and ice area of lakes based on remote sensing data [37].
The objective of the study was establishing or characterizing the lake phenology of Qinghai Lake based on MODIS remote sensing data using the RF algorithm in the GEE platform. Results were compared with six common water indexes (TCW, NDWI, MNDWI, MBWI, WI2015, and AWEI) and the ice threshold value methods. After applying cloud filtering processes to address the affected values of daily lake surface water/ice/cloud coverage, the four critical ice phenology dates could be correctly obtained. The visual interpretation method was used to validate the above date results. Last, the results of RF, trends in the date of critical ice phenology dates and two key proposed indicators, date of the first negative LST (DFNLST) and date of the first positive LST (DFPLST) (Appendix A), were discussed.
The paper is organized as follows. Section 2 describes the study area, data, and methods. Section 3 presents the RF classification, comparison of RF and water/ice extraction methods, lake phenology, and validation results. Section 4 illustrates the discussion of results. Section 5 concludes the whole study.

Study Area
Qinghai Lake is the largest lake in China, located in an endorheic basin on the QTP, and classified as a saline and alkaline lake formed in an intermontane tectonic depression of the northeast margin of the QTP [38]. Qinghai Lake extends to cover 36 • 32 -37 • 15 N, 99 • 36 -100 • 46 E [39] and has a surface water area of 4321.00 km 2 and an elevation of 3193.80 m a.s.l. as of 2010 [40] (Figure 1). The average lake depth is 21.00 m, and the maximum depth is 25.50 m, as measured in 2008 [38]. The lake's catchment is prismatic in shape, with a west-east length of approximately 106 km, a north-south width of roughly 63 km, and a perimeter of about 360 km. Twenty-three rivers and streams empty into Qinghai Lake, most of which are seasonal. Five permanent streams provide 80% of the total influx: the Buha, Shaliu, Haergai, Quanji, and Heima Rivers. The main tributary is the Buha River, which contributes approximately half of the annual water input [41]. The lake fluctuated in area, it shrank over the 20th century, has increased since 2004 [42,43].
The annual mean precipitation is 300 to 400 mm. The mean yearly evaporation is approximately 1300 to 2000 mm. The rainfall from May through September accounts for 90% of the full year's precipitation, and the daily mean temperature is 2.8 • C [44]. Qinghai Lake begins to freeze up in late November, and begins to break up in late March, and all ice melts off by late April [39,[45][46][47].

Data
The MODIS is a payload imaging sensor built by Santa Barbara Remote Sensing that was launched into orbit by the National Aeronautics and Space Administration (NASA) on 18 December 1999 onboard the Terra (Earth Observation Satellite, EOS AM) satellite [48] and 4 May 2002 onboard the Aqua (EOS PM). The sensor is a multispectral cross-track scanning radiometer that operates in the visible band through the thermal infrared band. As a multidisciplinary instrument, MODIS is designed to measure land surface features on a global basis every day, and understand the global earth system as a whole and the interactions among various processes. Data are captured in 36 spectral bands, with wavelengths ranging from 0.4-14.5 µm and spatial resolutions with varying differences, that is, two bands (1-2) at 250 m, five bands (3)(4)(5)(6)(7) at 500 m and 29 bands  at 1 km, and the information of the former seven bands are shown in Table 1 in the GEE platform [49]. MODIS utilizes four onboard calibrators to provide in-flight calibration, solar diffuser (SD), solar diffuser stability monitor (SDSM), spectral radiometric calibration assembly (SRCA), and a v-groove black body.  The daily surface reflectance was Terra Surface Reflectance Daily Global 250 m (MOD09GQ version 6 provides bands 1 and 2 at 250 m resolution) and Terra Surface Reflectance Daily L2G Global 1 km and 500 m (MOD09GA version 6 provides bands 3-7 at 500 m resolution) in the GEE platform [46][47][48]. Due to its consistent resolution of 250-500 m and the highest spatial resolution in MODIS, band products were added to the RF classifier in GEE. The data time availability was from 24 February 2000 to 31 December 2018.

Training Data and Testing Data
The training data and testing data were derived from MODIS products: "Terra Surface Reflectance Daily Global 250 m" and "Terra Surface Reflectance Daily L2G Global 1 km and 500 m". The four seasons were studied, with their respective months, include Spring: March-May; Summer: June-August; Autumn: September-November; and Winter: December-February. In general, the FUS event of lake surface water generally occurs in November, and the BUE event of lake ice is in April. Four images with little cloud cover and snow-free MODIS in October, January, April, and July were selected each year to represent the critical sample events. From the 24 February 2000 to 31 December 2018, one hundred sample pixels were manually selected in each image (Table 2). Since the water mask was the most considerable extent in a year, there still were land surface pixels in the water mask. Snow may cover the land surface or lake surface. The snow-covered land surface could be classed as the land, and the snow-covered lake surface could be classed as the ice to reduce the classes. Sample pixels were divided into four classes: water pixel class was labeled 1, ice pixel class was labeled 2, land pixel class was labeled 3, and cloud pixel class was labeled 4. Additionally, if the MODIS images showed blurry sample pixels in the seven-band features, its sample pixel lacks one or several bands' spectral value. The preceding or subsequent images were used instead of the current images. For example, a MODIS image showed some blurry pixels on the 15 April 2001, but the pixels of the next image were normal on the 16 April 2001, which were used in place of the image on the 15th. Above normal image didn't contain blurry pixels, sample pixels have all bands' spectral value. At the same time, the water and ice pixels with thin cloud cover were also labeled 1 and 2. Since the RGB three bands could be visually recognized as water and ice, the machine recognizes the seven bands. The purpose was to obtain more water and ice classification results when the daily image was classified by RF classifier, but the thin cloud may also bring the errors. Snow-free and snow-covered land surface were labeled 3, and cloud cover land surface was labeled 4. Above procedures created the training and testing dataset, the total number of sample pixels was 7776, including water, ice, land, and cloud pixels, with two out of three (5114) randomly assigned to the training sample dataset and one in three (2662) assigned to the testing sample dataset.

Land Surface Temperature
The LST dataset was the Terra Land Surface Temperature and Emissivity Daily Global 1 km (MOD11A1.006). It provided daily LST and emissivity values in 1200 × 1200 grids. The exact grid size at 1 km spatial resolution is 0.928 km by 0.928 km. The daily mean temperature value was derived from daily day and night grid temperature products within the annual water mask. And the relationship between the critical ice phenology and temperature was analyzed. The data time availability was from 5 March 2000 to 31 December 2018 [51,52].

Water Mask and SRTM
The water mask used was the Terra Land Water Mask Derived from MODIS and SRTM (Shuttle Radar Topography Mission) Yearly Global 250 m (MOD44W V6 land/water mask 250 m product) in the GEE platform, with a data time availability from 2000 to 2015 and a 1-year cadence [53].
The SRTM digital elevation dataset was originally produced to provide consistent, high-quality elevation data at near global scope. Using new interpolation algorithms and better auxiliary digital elevation models (DEMs) [54], the spatial resolution is 90 m [55], available from the CGIAR-CSI SRTM 90 m Database (http://srtm.csi.cgiar.org).

RF Algorithm
The RF algorithm is a classification and regression algorithm originally designed for machine learning [56]. This algorithm is being increasingly applied to satellite and aerial image classification and the creation of continuous field datasets, such as percent tree cover and biomass [57]. RF has several advantages over other image classification methods [58]: RF is nonparametric, capable of using continuous and categorical datasets, easy to parameterize, not sensitive to overfitting, good at dealing with outliers in training data and calculates other information such as classification error and variable importance [31,59,60]. RF is an ensemble model that uses the results from many different models to calculate a response. In most cases, the results of an ensemble model will be better than the results from any individual model [61]. In the case of RF, several decision trees (DTs) are created, and the response is calculated based on the outcome of all DTs [62][63][64]. A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute, each branch presents the outcome of the test, and each leaf node represents a class label.
Since forests are made of trees, and the random forest algorithm is the combination of many decision trees. The number of trees is one of the most important factors in the RF variables. In general, the number of RF trees was set to 50 or 100. The random forest algorithm includes six parameters, (1) numberofTrees: the number of decision trees to create per class; (2) variablesPerSplit: the number of variables per split, default is the square root of the number of numbers; (3) minLeafPoplation: the minimum size of a terminal node, default is 1; (4) bagFraction: the fraction of input to bag per tree, default is 0.5; (5) outofBagMode: whether the classifier should run in out-of-bag mode, default is false; (6) seed: random seed, default is 0.
The classifier accuracy and validation accuracy for RF were different from the number of RF trees different, especially the validation accuracy. The input data were 5114 training samples dataset and 2662 testing samples dataset, and the variables were four classes (water, ice, land, and cloud). Setting variablesPerSplit, minLeafPoplation, bagFraction, outofBagMode, and seed variables of RF classifier in GEE to the default condition, many trials were conducted from 10 to 500 steps of 10 for numberofTrees variable to select the suitable and performable number of RF trees. The results are shown in Figure A1. The accuracy of the RF classifier was greater than 0.9500, except for ten trees. When the number of trees was greater than 100, the classifier accuracy was close to 1.0000. The RF classifier's validation accuracy was between 0.7472 and 0.8072, except for ten trees, which was 0.6694.
Specifically, the validation accuracy of the RF classifier is greater than 0.7818, which ranges from 70 to 100. The largest validation accuracy was 0.8072 in 90 trees. Many trials were taken 1 step to illustrate the validation accuracy for the number of trees that ranged 10-40 and 70-100. And the results were shown in Figure A1b,c. When the number of trees was less than 40, the classifier accuracy showed a gradual upward trend, and the greatest classifier accuracy was 0.9894 in 38 trees. When the number of trees was greater than 6, the classifier accuracy was greater than a constant 0.9000. Validation accuracy showed a trend of increasing volatility. The minimum value was 0.4770 in 1 tree, and the greatest value was 0.7668 in 40 trees. When the number of trees was greater than 15, the validation accuracy values also fluctuated between 0.7472 and 0.7668. The focus was on the number of trees that ranged from 70 to 100, with the greatest validation accuracy value being 0.8072 in 85 and 90 trees, and the classifier accuracy was 0.9965. Herein, the key dependent factors were validation accuracy and high computing performance. Therefore, 85 was selected as the number of RF trees. Daily MODIS surface reflectance input into the RF classifier model, the four-class classification results of the daily image would be classed.

Lake Ice Threshold Value
Remote sensing extraction methods of lake ice mainly include index methods and ice threshold value methods [20,45,65,66], and many other scholars have also used visual interpretation many times [14]. Index methods were dependent on the spectral characteristics of water and ice to distinguish ice indirectly. The ice threshold value method depended on the differences between water and ice reflectances and temperature factors to distinguish ice directly. However, the most robust approaches were found to be the ice threshold value method, which used the differences in spectral bands to eliminate the atmospheric influence and system errors to some extent, and the Equation (1) is as follow: where ρ 1 and ρ 2 represent the first band (Red) and second band (Near Infrared) of MODIS MOD09GQ. a and b are the associated threshold value parameters, where a is set to 0.028, and b is set to 0.05 [67].

Water Index
Many water index methods have been developed over the past several decades and used widely. This paper selects the following six commonly used water index extraction methods based on previous research literature: TCW [23], NDWI [68], MNDWI [69], AWEI [29], WI2015 [28], and MBWI [27], as shown in Table 3 [43], and compares them with RF as reference. Table 3. Six commonly used water index methods.

Water Index Equation No
TCW where ρ 1 and ρ 2 represent the first band (Red) and second band (Near Infrared) of MODIS MOD09GQ. ρ 3 ∼ ρ 7 represent the third band (Blue) through the seventh band (Short Wave Infrared 3) of MODIS MOD09GA.

Postprocessing
The daily surface water area and ice area of Qinghai Lake were not constant but varied with time. The daily surface water area and ice area exhibited some affected values because of the clouds and haze, cloud shadow, and unfavorable illumination conditions. These factors resulted in the daily surface water coverage (surface water area divided by the total area) and ice coverage (ice area divided by the total area). Cloud filtering processing was conducted to correct some affected values, which are as follows (Figure 2).

Figure 2.
Processing flowchart of MODIS data based on the GEE platform (ice is derived from Ice Threshold Value, and water is derived from six common water index methods. The results are compared with ice and water derived from RF. The daily ice coverage and water coverage derived from RF were processed through cloud filtering. wri: i-th day water coverage; iri: i-th day ice coverage).
1. If the daily surface water coverage of the lake was less than 0.01 and the daily ice coverage was greater than 0.50, this indicated no surface water and a fully iced over the lake; thus, the surface water coverage of the lake was set to 0.00, and the ice coverage was set to 1.00.
2. For the same reason, if the daily surface water coverage was greater than 0.50, the daily ice coverage was less than 0.01, indicating full surface water and no ice. Thus, the surface water coverage was set to 1.00, and the ice coverage was set to 0.00.
3. For an arbitrary three consecutive days, if the daily surface water coverage on the first day and last day were greater than 0.99, that of the midday was less than 0.99, indicating full surface water covers three consecutive days. Thus, the surface water coverage was set to 1.00, and the ice coverage was set to 0.00. 4. For an arbitrary three consecutive days, if the daily surface water coverage on the first day and last day were less than 0.01, that on midday was greater than 0.01, indicating no surface water cover occurred on the three consecutive days. Thus, the surface water coverage was set to 0.00, and the ice coverage was set to 1.00.
5. For the same reason, for an arbitrary three consecutive days, if the daily ice coverage on the first day and last day were greater than 0.99, that on midday was less than 0.99, indicating full ice cover occurred on the three consecutive days. Thus, the ice area coverage was set to 1.00, and the surface water coverage was set to 0.00. 6. For an arbitrary three consecutive days, if the daily ice coverage on the first day and last day were less than 0.01, that on midday was greater than 0.01, indicating no ice cover occurred on the three consecutive days. Thus, the ice coverage was set to 0.00, and the surface water coverage was set to 1.00.
7. The other daily surface water coverage and ice coverage that were not affected by the clouds, cloud shadow, and unfavorable illumination conditions remained unchanged.
The critical lake phenology dates for lakes undergoing freeze-thaw cycles could be better extracted based on the above daily surface water coverage and ice coverage processes. During the one ice season, the FUS date is the day before the surface water coverage begins to be less than 1.0, and the ice coverage begins to be greater than 0.0. The FUE date is when the surface water coverage decreases to 0.0, and the ice coverage increases to 1.0. The BUS date is the day before the surface water coverage begins to be greater than 0.0, and the ice coverage begins to be less than 1.0. The BUE date is when the surface water coverage increases to 1.0 and the ice coverage decreases to 0.0. The detailed critical ice phenology extraction process is shown vividly in lake phenology section. We further validated the lake phenology through the visual interpretation method due to the lack of in-situ ice observation data. According to the date-centric principle of the above critical ice phenology dates, visual interpretation was the most direct and accurate way to judge the existence of ice [4,15].

Accuracy of RF Algorithm
The number of trees of RF was 85. Tables A1 and A2 (Appendix B) showed training and testing accuracy. Table A1 showed the training classification results of four classes based on the confusion matrix of RF. Four class results were expressed as the pixel count, the bold count stands for the predicted accurate class results, while the accuracy of RF was expressed as the percentage. The training kappa coefficient was 99.51%, the overall training accuracy of RF was 99.65%; the agreement is almost perfect. The obtained accuracies were high for all classes. Table A2 (Appendix B) showed the testing classification results of four classes based on the confusion matrix of RF. The overall testing accuracy was 80.72%, testing the kappa coefficient was 73.45%-the agreement is almost perfect. The producer accuracies were high for four classes.

Classification of RF
The RF algorithm could classify water, ice, land, and cloud classes according to trained RF classifiers based on four classes of training samples and testing samples. Cloud cover is the greatest impact for classifying the lake water and ice. Select eight images of four seasons during the 2000/2001 and 2015/2016 ice season with cloud cover. Figure 3 visually listed the classification results and results within the water mask. There are some errors of commission When the cloud covers the land, the cloud shadow might be classified as water. Meanwhile, there are some errors of omission. When the cloud covers the lake surface water or ice, the cloud cover could be classified as the cloud and little land. Classified water and ice only occur over the lake, and some land and cloud could affect the lake surface. Daily lake water and ice inside the lake were extracted by using annual water mask products. The cloud filtering was adopted to derive the lake water coverage, lake ice coverage, and lake phenology.

Comparison of RF and Water/Ice Extraction Methods
The RF algorithm could classify both the lake's daily surface water area and the ice area, but the ice threshold value method extracted only the lake ice area. The water index extraction methods extracted only the lake surface water area. Three cloud-free cover images were selected to visually demonstrate the advantages of the RF algorithm compared to the ice threshold value and six water index methods: a full ice cover scenario on 4 February 2001, an ice fraction and open water fraction scenario on 1 April 2001, and a full surface water cover scenario on 13 July 2001, as shown in Figure 4. Meanwhile, a comparison of water and ice areas is quantified in Table 4.   Figure 4 illustrates the full ice cover, ice fraction and open water fraction, and full surface water cover scenarios. In the first column, Figure 4a1-c1 show the raw images with bands 1, 4, and 3. Figure 4a2 shows the extracted full ice cover result of the RF algorithm, which was very consistent with Figure 4a1, where the ice area cover was 4217.50 km 2 , and surface water cover was 0.00 km 2 . Figure 4a3 shows the extracted full ice cover result from the ice threshold value method, consistent with Figure 4a1. However, the middle left was slightly omitted, the ice area cover was 4146.06 km 2 , and the rest of the omission area was at the middle left part. Figure 4a4-a9 showed the surface water cover results for the full ice cover scenario by water index methods consistent with the raw image's ice cover. However, there were too many omissions based on the MBWI method: the surface water area was 2254.00 km 2 , and other surface water areas were 4223.25 km 2 , 4222.50 km 2 , 4295.25 km 2 , 4288.00 km 2 , and 4286.25 km 2 based on the TCW, NDWI, MNDWI, AWEI, and WI2015 methods, respectively. These surface water areas were consistent with the ice area of RF. Figure 4b2 shows the extracted ice fraction and open water fraction results obtained using the RF algorithm, consistent with Figure 4b1. The ice area cover was 2431.31 km 2 , the surface water cover was 1769.06 km 2 , the total area was 4200.37 km 2 , and the total area was consistent with the full ice area obtained using RF, which was 4217.50 km 2 . Figure 4b3 shows the extracted ice result of the ice threshold value method. The ice area was 2115.69 km 2 , which was consistent with the ice covers of Figures 4b1 and 4b2. However, the ice cover was slightly omitted, and the surface water area was indistinct. Figure 4b4-b9 shows the surface water cover results, which were consistent with those of the raw image, but there were too many omissions by the TCW, MNDWI, and MBWI water index methods, and the surface water areas were 3742.25 km 2 , 4069.75 km 2 , and 2866.00 km 2 , respectively. Other surface water areas by the NDWI, AWEI, and WI2015 water index methods were 4136.50 km 2 , 4166.25 km 2 , and 4165.75 km 2 , respectively. These surface water areas were consistent with the ice area's sum and surface water area by RF. However, these methods could not obtain the ice cover and surface water cover areas simultaneously. Figure 4c2 shows the extracted full surface water cover result of the RF algorithm, which was very consistent with the raw image. The surface water cover area was 4202.69 km 2 , but the ice cover area showed a slight commission error of 22.00 km 2 . Figure 4c3 shows the extracted ice result of the ice threshold value method. The ice area was 45.25 km 2 , which was a commission error. Figure 4c4-c9 shows the surface water results for the full surface water cover scenario, which were consistent with the raw image, except for TCW, which showed too many omissions, and the surface water was only 1391.75 km 2 . The NDWI method was very consistent with the raw image, as the surface water area was 4214.00 km 2 ; the surface water areas were 3889.50 km 2 , 4048.75 km 2 , 4051.75 km 2 , and 3889.50 km 2 based on the MNDWI, AWEI, WI2015, and MBWI index methods, respectively, with a few omissions.

Lake Phenology for the 2016-2018 Ice Season
Based on the above cross-compared results, the ice threshold value method obtained only the ice area, and these water index methods obtained only the surface water area. The advantages of the RF algorithm was visually demonstrated. The RF algorithm could concurrently extract a more accurate daily surface water area and ice area. Therefore, the RF algorithm was an excellent method to accurately extract the lake's surface water area and ice area. Furthermore, the lake's daily surface water coverage and ice coverage and four critical ice phenology dates were calculated, as shown in Figure 2. There may be affected values in the daily surface water coverage and ice coverage; these affected values were processed using cloud filtering to obtain a more accurate dataset.
Cloud filtering processing was applied to correct the water and ice coverage and improve the accuracy of the critical ice phenology dates. To visualize the results of one full ice season using cloud filtering after using the RF algorithm, 20 raw images and the corresponding extracted water/ice cover results of RF for five adjacent dates of FUS (  (Figures 5d3 and 5d3-1) during the 2016-2017 ice season. Then, the corresponding dates were traced back to the raw images and extracted water/ice cover results to validate the critical dates, which was more reliable. To explore the more extended periods of lake phenology, Figure 6a-c shows the daily surface water coverage, ice coverage, and cloud coverage, respectively, with affected values from 1 August 2016 to 1 August 2018. The daily surface water coverage and ice coverage was affected by cloud coverage. Some days are more severe, especially during the ice period. The extracted freeze-up and break-up event dates were not obtained. Figure 6d,e show the corrected surface water coverage and ice coverage using cloud filtering and the critical ice phenology dates during the two ice season periods.

Lake Phenology for the 2000-2018 Ice Season
Setting the above cloud filtering as the processing criteria of uncorrected daily surface water coverage and ice coverage of all images ranged from 2000 to 2018, Figure 7a-c shows the uncorrected surface water coverage, ice coverage, and cloud coverage. The daily cloud coverage of all images was about 0.0. Only some days could be greater than 0.2, mainly the ice period. And Figure 7d,e shows the corrected surface water coverage and ice coverage using cloud filtering. Four critical dates for each ice season were extracted from the above results. Figure 7f shows the surface water coverage and ice coverage during each freeze-up and break-up event period. The four critical dates of lake phenology for each ice season are depicted on the horizontal axis, known as the Julian day, with FUS and FUE on the upper horizontal axis and BUS and BUE on the lower horizontal axis.

Validation
The visual interpretation method was applied to these images with the possible representation of these dates from visible imagery to recognize the four critical dates during the 2000-2018 period. According to the date-centric principle of the critical ice phenology dates, the MODIS remote sensing images were selected. Appling visual interpretation method to validate the corresponding critical ice phenology date. When the persistent cloudiness occurs, two cloudless or less clouded MODIS images at the front and back were judged. The comparison is between the extracted dates using RF and the extracted dates using the visual interpretation method during the 2000-2018 period (Figure 8). Figure 8a shows the comparison dates of FUS, where R 2 was 0.82, root mean square error (RMSE) was 2.33 days, the mean value of visual interpretation was 340.16 days; the mean value of RF was 341.89 days, and FUS was approximately early December. Approximately half of the points were on the 1:1 line, and other points were above the 1:1 line. The dates of FUS using RF were slightly greater than those using visual interpretation, with the most considerable difference in 2018 and a later Julian day in 2015-2017. Figure 8b shows the comparison dates of FUE, with an R 2 of 0.93, RMSE of 2.17 days, the mean value of visual interpretation of 372.00 days, the mean value of RF of 370.78 days, and FUE was approximately early January. Only one point was above the 1:1 line, and the other points were on or below the 1:1 line; the dates of FUE using RF were slightly less than those using visual interpretation, and the Julian day was later in 2015-2016. Figure 8c shows the comparison dates of BUS, with an R 2 of 0.84, RMSE of 3.65 days, the mean value of visual interpretation of 82.11 days, the mean value of RF of 85.37 days, and the BUS was approximately late March. Approximately half of the points were on the 1:1 line. The other points were above the 1:1 line; the dates of BUS using RF were slightly greater than those using visual interpretation, with the most considerable difference in 2014 and an earlier Julian day in 2014-2016. Figure 8d shows BUE's comparison, with an R 2 of 0.90, RMSE of 2.33 days, the mean value of visual interpretation of 99.79 days, the mean value of RF of 99.32 days, and the BUE was approximately early April. Nearly all points were on the 1:1 line, and other points were below the 1:1 line, except one point above the line. The dates of BUE using RF were very close to those using visual interpretation, with the most considerable difference in 2015 and an earlier Julian day in 2014-2016. The dates of BUS and FUS using RF were slightly overestimated, with a value of 2.50 days, and the dates of the BUE and FUE using RF were somewhat underestimated, with a value of 0.85 days. Figure 8e shows the overall results were excellent, the R 2 of 0.99, RMSE of 3.81 days. The dates using RF were slightly overestimated, and the value was 0.85 days. Thus, the critical ice phenology dates extracted using RF were highly credible.

Trends in Lake Phenology
The availability of critical ice phenology dates for each ice season illustrated the critical events of the freeze-thaw cycle changes in Qinghai Lake ( Figure A2). The four continuous freeze-thaw cycle processes are as follows: (1) surface water freeze-up, (2) full ice cover, (3) ice break-up, and (4) full surface water cover. Since the MODIS images were affected by cloud cover, the results were shown in Figures 6  and 7. The cloud coverage brought the water and ice coverage errors, resulting in underestimating of the water and ice coverage. Cloud filtering procedures were applied to improve the water and ice coverage so that the sum is equal to 1. According to the above results, four critical ice phenology dates were calculated. Figure A2 shows the availability of critical events and delayed/advanced rates for the lake's freeze-thaw cycles during 2000-2018. Statistical significance was estimated. The p-value of FUS, FUE, and BUE were 0.001, 0.001 and 0.006. Only the p-value of BUS was 0.207, as well as, the error bars to every point illustrate the associated uncertainty of the critical ice phenology.

LST and Lake Phenology
Temperature is a crucial driver of global climate change and extreme hydrological events. Lake surface conditions respond dramatically to climate change [70][71][72]. Changes of temperature could change the dates of freeze-up and break-up events; 0 • C isotherms have proven to be an effective temperature index connected to lake ice phenology [73,74]. The availability of LST derives from MODIS is relatively convenient due to the lack of meteorological data for a long time, and there is a strong agreement between LST and air temperature on a flat surface, such as lake surface [51,52,75]. The lake phenology dates could be predicted if some temperature variables from LST were captured. LST was derived from the lake's surface mean temperature (day and night). The daily LST would interact on the lake surface, causing critical ice phenology changed. To delineate LST correlations and the four critical ice phenology variables (FUS, FUE, BUS, and BUE), two key indicators, DFNLST and DFPLST, were proposed after repeated and numerous calculations and comparative analysis experiments.
Because DFNLST is ahead of FUS, and DFPLST is ahead of BUS, focus on the correlation between these two above. Figure 9a,b illustrate the correlation and availability between the critical DFNLST and FUS and between the DFPLST and BUS for the freeze-thaw cycle emphatically during 2000-2018, respectively. Figure 9a shows that the DFNLST presented a decreasing trend and a negative relationship with FUS, with a coefficient of −0.34. The discovery of this phenomenon was an exciting find. The mean value was Julian day 304.37, which was approximately 37 days before the FUS, revealing that the first negative LST (cold air) occurs increasingly earlier. Figure 9b showed that the DFPLST also presented a decreasing trend and positive relationship with BUS, with a coefficient of 0.38. A mean value of Julian day 65.26, which was approximately 20 days before the BUS, revealing that the first positive LST (warm air) occurs increasingly earlier. Sapna Sharma [76] illustrated the widespread loss of lake ice around the Northern Hemisphere in a global warming world. Shilong Piao [77] mentioned that Spring and Autumn temperatures over the northern latitudes having risen by approximately 1.1 • C and 0.8 • C over the past two decades, where the availabilities of FUS, BUS, and DFPLST were consistent with the temperature changes. However, the DFNLST availability was not consistent, because the QTP was more sensitive than other regions to LST changes under global climate change. The first cold air outbreak cooling events occurred increasingly earlier in Autumn, and simultaneously, the first warm air outbreak event occurred increasingly earlier in Spring, and the DFNLST and DFPLST trends were the opposite of the trends of mean temperatures in Autumn and Spring in the northern hemisphere [78]. The difference may lie in the use of region scale versus hemisphere scale; another cause may be derived from the strong seasonal climate variations, which involve interactions between the warm Indian monsoon and cold westerlies [79]. Therefore, two newly proposed key indicators, DFNLST and DFPLST, enabled the determination of six critical dates versus the four critical lake phenology dates. The changing trend and variation varied widely for the six critical dates of lake phenology. Figure 9c shows a changing trend in detail, with a range of variations and average values for the critical DFNLST, FUS, FUE, DFPLST, BUS, and BUE dates freeze-thaw cycle of Qinghai Lake during 2000-2018. The variation in DFNLST ranged from the middle of October to late November, with corresponding Julian day ranging from 284.69 to 324.05. The average Julian day was 304.37, the amplitude was 19.68 days, and the changing trend began earlier. These changing trends may result from Asian monsoons and atmospheric circulation patterns [80]. The FUS variation ranged from the beginning of December to the middle of December, with corresponding Julian day ranging from 336.55 to 347.23. The average Julian day was 341.89, the amplitude was 5.34 days, and the changing trend was delayed. The variation in FUE ranged from the end of December to the middle of January, with corresponding Julian day ranging from 363.54 to 13.02. The average was Julian day 5.78, the amplitude was 7.24 days, and the changing trend was also delayed. The variation in DFPLST ranged from late February to late March, with the corresponding Julian day ranging from 49.70 to 80.82. The average was Julian day 65.26, the amplitude was 15.56 days, and the changing trend occurred earlier. The BUS variation ranged from the middle of March to early April, with the corresponding Julian day ranging from 76.74 to 94.00. The average was Julian day 85.37, the amplitude was 8.63 days, and the changing trend occurred earlier. The BUE variation ranged from the beginning of April to the middle of April, with the corresponding Julian day ranging from 92.51 to 106.13. The average was Julian day 99.32, the amplitude was 6.81 days, and the changing trend also occurred earlier. The above change trends are consistent with studies that imply a hot future [81,82].
The LST sum between any two critical dates and the correlation between them would be further discussed. Meanwhile, 0 • C isotherms based on a 30-day average would be explored in future research work. There are still many defects in the present research and discussion, and more details would be discussed in future research works.

Conclusions
The focus of this research was on the lake phenology of the freeze-thaw cycle in lakes. Qinghai Lake was selected as a case study. MODIS products were used during 2000-2018, RF extracted lake surface and ice areas, the ice threshold value, and six water index (TCW, NDWI, MNDWI, AWEI, WI2015, and MBWI) methods. The optimal number of RF trees was 85. The validation accuracy value was 0.8072, the classifier accuracy was 0.9965, and comparisons found that the RF algorithm was more applicable and performed better for the freeze-thaw cycles in lakes. After applying postprocessing to address the affected values of daily lake surface water/ice/cloud coverage, the four critical ice phenology dates were correctly extracted. The results were validated using the visual interpretation method. The R 2 was 0.99, the BUS and FUS dates were overestimated by 2.50 days, the BUE and FUE dates using RF were underestimated by 0.85 days, and the total was overestimated by 0.85 days. Thus, the FUS, FUE, BUS, and BUE dates extracted using RF were credible.
The mean FUS, FUE, BUS, and BUE dates were Julian day 341.89, 370.78, 85.37, and 99.32. The delayed rates of FUS and FUE were 0.62 days/year and 0.48 days/year, respectively. And the advanced rates of BUS and BUE were 0.48 days/year and 0.07 days/year, respectively. The FUS and FUE dates presented delays of 11.21 days and 8.21 days, respectively, and the BUS and BUE dates occurred 8.59 days and 1.26 days earlier, respectively, during 2000-2018. The consideration of two newly proposed key indicators, DFNLST and DFPLST, enabled the use of six critical dates that influence lake phenology. DFNLST presented a decreasing trend and a negative relationship with FUS. DFPLST also presented a decreasing trend and a positive relationship with BUS and revealed that the first negative LST (cold air)/first positive LST (warm air) occurs increasingly earlier.
The results of this study demonstrate the high performance of the RF machine learning algorithm in GEE for lake phenology, as well as the availability of four critical dates and two key proposed dates for freeze-thaw cycles in lakes. The DFNLST and DFPLST values required rigorous discussion and theoretical support. The relationships between these two indicators and other critical dates call for further research field involvement and scientific investigations. The cloudy pixels should be accounted for, and the lake surface with cloud cover could be filled and reconstructed.