A New Retrieval Algorithm of Fractional Snow over the Tibetan Plateau Derived from AVH09C1

: Snow cover products are primarily derived from the Moderate-resolution Imaging Spectrometer (MODIS) and Advanced Very-High-Resolution Radiometer (AVHRR) datasets. MODIS achieves both snow/non-snow discrimination and snow cover fractional retrieval, while early AVHRR-based snow cover products only focused on snow/non-snow discrimination. The AVHRR Climate Data Record (AVHRR-CDR) provides a nearly 40-year global dataset that has the potential to fill the gap in long-term snow cover fractional monitoring. Our study selects the Qinghai–Tibet Plateau as the experimental area, utilizing AVHRR-CDR surface reflectance data (AVH09C1) and calibrating with the MODIS snow product MOD10A1. The snow cover percentage retrieval from the AVHRR dataset is performed using Surface Reflectance at 0.64 µ m (SR1) and Surface Reflectance at 0.86 µ m (SR2), along with a simulated Normalized Difference Snow Index (NDSI) model. Also, in order to detect the effects of land-cover type and topography on snow inversion, we tested the accuracy of the algorithm with and without these influences, respectively (vanilla algorithm and improved algorithm). The accuracy of the AVHRR snow cover percentage data product is evaluated using MOD10A1, ground snow-depth measurements and ERA5. The results indicate that the logic model based on NDSI has the best fitting effect, with R-square and RMSE values of 0.83 and 0.10, respectively. Meanwhile, the accuracy was improved after taking into account the effects of land-cover type and topography. The model is validated using MOD10A1 snow-covered areas, showing snow cover area differences of less than 4% across 6 temporal phases. The improved algorithm results in better consistency with MOD10A1 than with the vanilla algorithm. Moreover, the RMSE reaches greater levels when the elevation is below 2000 m or above 6000 m and is lower when the slope is between 16 ◦ and 20 ◦ . Using ground snow-depth measurements as ground truth, the multi-year recall rates are mostly above 0.7, with an average recall rate of 0.81. The results also show a high degree of consistency with ERA5. The validation results demonstrate that the AVHRR snow cover percentage remote sensing product proposed in this study exhibits high accuracy in the Tibetan Plateau region, also demonstrating that land-cover type and topographic factors are important to the algorithm. Our study lays the foundation for a global snow cover percentage product based on AVHRR-CDR and furthermore lays a basic work for generating a long-term AVHRR-MODIS fractional snow cover dataset.


Introduction
Since the first generation of AVHRR was sent to space in 1976, it has been possible to apply daily snow monitoring products at the global scale based on space remote sensors [1][2][3].Among these, snow monitoring based on optical sensors mainly focuses on the inversion of snow cover area, such as AVHRR, MODIS, VIIRS, etc. [4][5][6].Through systematic data processing flow, these sensors can generate long-term series snow cover products, such as Northern Hemisphere Weekly Snow Cover and Sea Ice Extent (NHSCE) [7,8], Interactive Multi-sensor Snow and Ice Mapping System (IMS) [7], Moderate Resolution Imaging Spectroradiometer (MODIS) [3,9,10], and Global Snow Monitoring for Climate Research (GlobSnow) [11].NHSCE's snow cover product provided weekly SCE from 1966 to 1997 with a spatial resolution of 190 km [8].IMS has provided daily SCE imagery at 24, 4 and 1 km spatial resolution from 1997 to the present day [7,12].The MODIS/Terra Snow Cover Daily L3 Global 500 m Grid (MOD10A1, recently version 6) is a global, gridded Fractional Snow Cover Area (FSCA) product produced daily by the National Snow and Ice Data Center (NSIDC) [3,4].It apples the Normalized Difference Snow Index (NDSI) to extract snow cover area, which leverages the contrasting reflectance of snow in the visible and shortwave infrared regions of the spectrum.FSCA is estimated using an empirical relationship between FSCA and NDSI, which apples TM data as the true data.MOD10A2, which revealed maximum snow cover, is generated by reading 8 days of 500 m resolution MOD10A1 tiles [13].Although the accuracy of MOD10A1 has reached a relative high level [14], its time series can only cover approximately twenty years.In the current research in the field of global climate change, especially in the related studies of vegetation variation in conjunction with long-term NDVI data, there have been many research works that have included the last 40 years of long time-series research when it comes to long time-series remote sensing data [15][16][17].It is therefore necessary to extend the snow product to a 40-year term length.
Earlier studies based on AVHRR snow cover inversion mostly used thresholding methods.Khlopenkov et al. implemented AVHRR-based snow cover inversion based on snow reflectivity characteristics in visible and infrared bands through the thresholding method and SPARC probability maps, and they removed cloud contamination in combination with bright temperature data [18].Husler et al. developed a snow extent retrieval method to identify Alpine snow extent using a 1 km AVHRR dataset based on Khlopenkov's research [19].Zhou et al. developed a long time-series snow extent recognition algorithm [20] based on 1 km resolution AVHRR-LAC and HRPT data, in the central Asia area, which uses an aggregated rating method that normalized the image class range from 0 to 1. Their method still belongs to the snow vs. non-snow binary classification.Hori et al. used the 3.7 µm band to replace the missing 1.64 µm band to simulate NDSI (NDSI AVHRR ) and set the discrimination threshold to 0.8 to produce a northern hemisphere snow cover dataset [21].Chen et al. calibrated the data using Landsat and adjusted the threshold for NDSI AVHRR to 0.77 based on Hori's research, creating the Qinghai-Tibet Plateau dataset [22].Prior to 2018, research predominantly focused on qualitative inversion, with the quantitative inversion of snow cover in long time-series of AVHRR gradually emerging in 2018.Hao et al. utilized NDSI AVHRR as the primary test for snow discrimination, creating a nationwide snow cover dataset for China.They adjusted the NDSI AVHRR threshold to 0.73 and 0.65 around the year 2000, and combined information from bands such as Surface Reflectance at 0.64 µm (SR1) and Surface Reflectance at 0.86 µm (SR2) to identify snow cover [23].
Previous studies tended to opt for using SR3 as a substitute for the near-infrared band in MODIS to calculate the NDSI value for AVHRR.Simultaneously, combining information from bands such as SR1 and SR2 serves as the primary basis for snow cover identification.Most snow cover products based on AVHRR still primarily involve binary classification into snowy or non-snowy conditions [24][25][26][27].However, NDSI AVHRR can be employed not only for distinguishing between snow and non-snow but also for determining the percentage of snow cover.Therefore, it is essential to explore the feasibility of constructing an FSC dataset based on NDSIAVHRR.The aim of our study is to assess the feasibility of generating FSC data based on AVHRR data in the Tibetan Plateau region.Additionally, precision evaluation of retrieval results will be conducted by incorporating MODIS snow cover products and ground observations.This study comprises six sections.Section 2 describes the study area and the datasets used in the study.Section 3 presents the details of the main algorithm.In Section 4, based on MODIS, ground observations and reanalysis data, we analyze the accuracies of AVHRR_FSC.Further discussion about the algorithm and its precision are listed in Section 6, as well as the potential problems of the AVHRR_FSC long-term dataset.Finally, in Section 6, we summarize this study and present our conclusions.

Study Area
The Tibetan Plateau, with an average elevation of 4000 m, is the third pole on Earth after the Arctic and Antarctic, formed by the collision and compression of the Indian plate against the Asian plate.During the exceptionally long winter, vast areas of the Tibetan Plateau are covered by snow.The snowmelt from this region serves as the source for many rivers, including the Yangtze, Yellow River, Indus, Ganges, and Mekong.Moreover, the snow cover on the Qinghai-Tibet Plateau is a primary water supply for vegetation growth on the plateau, providing crucial water resources for the ecosystem and human societies in the surrounding countries.Its rugged terrain makes it an ideal testing area.The selected region corresponds to the geographic definition of the Qinghai-Tibet Plateau, encompassing parts of China, Pakistan, India, Nepal, and other countries.It also includes the Pamir Plateau, the Himalayas, the Kunlun Mountains and the Tanggula Mountains.We chose three experimental areas, namely, the Pamir Plateau, the central part of the Gangdise Mountains and the central part of the Nyainqentanglha Mountains (Figure 1), to establish the relationship between AVH09C1 band information and snow cover percentage.These areas are characterized by perennial snow cover [28,29], making them ideal experimental regions.
Remote Sens. 2024, 16, x FOR PEER REVIEW 3 of 18 on MODIS, ground observations and reanalysis data, we analyze the accuracies of AVHRR_FSC.Further discussion about the algorithm and its precision are listed in Section 6, as well as the potential problems of the AVHRR_FSC long-term dataset.Finally, in Section 6, we summarize this study and present our conclusions.

Study Area
The Tibetan Plateau, with an average elevation of 4000 m, is the third pole on Earth after the Arctic and Antarctic, formed by the collision and compression of the Indian plate against the Asian plate.During the exceptionally long winter, vast areas of the Tibetan Plateau are covered by snow.The snowmelt from this region serves as the source for many rivers, including the Yangtze, Yellow River, Indus, Ganges, and Mekong.Moreover, the snow cover on the Qinghai-Tibet Plateau is a primary water supply for vegetation growth on the plateau, providing crucial water resources for the ecosystem and human societies in the surrounding countries.Its rugged terrain makes it an ideal testing area.The selected region corresponds to the geographic definition of the Qinghai-Tibet Plateau, encompassing parts of China, Pakistan, India, Nepal, and other countries.It also includes the Pamir Plateau, the Himalayas, the Kunlun Mountains and the Tanggula Mountains.We chose three experimental areas, namely, the Pamir Plateau, the central part of the Gangdise Mountains and the central part of the Nyainqentanglha Mountains (Figure 1), to establish the relationship between AVH09C1 band information and snow cover percentage.These areas are characterized by perennial snow cover [28,29], making them ideal experimental regions.

AVHRR Dataset
Recently, the Long-Term Data Record (LTDR) project, funded by NASA, created a 0.05° global daily dataset.This LTDR version 5 covers the years 1982 to 2020 and includes improved atmospheric correction and inter-calibration between sensors.It includes three main products: AVH02C1 (Level 1 product), AVH09C1 (Daily Surface Reflectance), AVH13C1 (Daily NDVI).
The AVH09C1 dataset provides continuous daily surface reflectance and brightness temperature, sourced from AVHRR sensors on NOAA polar-orbiting satellites such as NOAA-7 to NOAA-19 (Table 1).Additionally, AVH09C1 has calibrated data from various instruments since 1981 [30].According to previous research, the utility error of AVH09C1 surface reflectance is comparable to MODIS [30,31].Therefore, we choose AVH09C1 as the primary data source for snow cover percentage retrieval.

AVHRR Dataset
Recently, the Long-Term Data Record (LTDR) project, funded by NASA, created a 0.05 • global daily dataset.This LTDR version 5 covers the years 1982 to 2020 and includes improved atmospheric correction and inter-calibration between sensors.It includes three main products: AVH02C1 (Level 1 product), AVH09C1 (Daily Surface Reflectance), AVH13C1 (Daily NDVI).
The AVH09C1 dataset provides continuous daily surface reflectance and brightness temperature, sourced from AVHRR sensors on NOAA polar-orbiting satellites such as NOAA-7 to NOAA-19 (Table 1).Additionally, AVH09C1 has calibrated data from various instruments since 1981 [30].According to previous research, the utility error of AVH09C1 surface reflectance is comparable to MODIS [30,31].Therefore, we choose AVH09C1 as the primary data source for snow cover percentage retrieval.

MODIS Dataset
In the MODIS remote sensing product series, there are primarily two snow cover percentage products: MO/YD10A1 and MO/YD10C1 [32,33].Among them, MO/YD10C1 is in Climate Modeling Grid format, derived from calculations based on MO/YD10A1 snow observation values.The spatial resolution of MO/YD10A1 is 500 m.Under ideal illumination, clear sky and smooth surface conditions, the overall absolute accuracy of MOD10A1 exceeds 95% [34,35].
In this study, we selected MOD10A1 data from six temporal phases corresponding to AVH09C1 for curve fitting in the training dataset.Additionally, we chose MOD10A1 data from another six temporal phases as the validation dataset.Both the training and validation datasets (Table 2) consist of two temporal phases for each test area (Figure 1).

Ground Observations and Reanalysis Dataset
Because, MOD10A1 serves only as a simulated result, in addition to our use of MOD10A1 data for validation we also require actual ground data to evaluate the accuracy of our snow cover product.Consequently, ground snow-depth measurement data provided by the China Meteorological Administration (CMA) for the years 2000 to 2012 were utilized to validate the AVHRR FSC product.Daily snow-depth measurements were taken near each station using a professional meter stick at 8:00 A.M. Beijing time, provided that the day's snow cover in the field of view exceeded 50%.The CMA stations selected for validation were carefully chosen to avoid an overrepresentation of non-snow samples, which could affect the accuracy of the assessment.To ensure the reliability of the validation, only the measured data from stations with more than 20 days of snow coverage and a depth greater than 1 cm were selected.Additionally, we utilized the ERA5 snow cover data.ERA5 is the fifth generation of reanalysis data [36].According to previous research, ERA5 is effective in representing variables such as weather events [37,38].Therefore, we incorporated the ERA5 snow cover dataset as a comparative data source for accuracy validation.

Land-Cover Type
According to previous studies [39,40], the complexity of surface conditions affects snow cover inversion accuracy.Surface conditions are primarily classified into two categories: topographic conditions and vegetation cover conditions.Topography mainly relates to elevation and slope.Vegetation cover conditions usually refer to vegetation types.For instance, in forest areas, trees will partially obscure snow-covered surface, and the effect of trees should be considered when mapping snow cover in forests, otherwise the snow cover in forested areas may be underestimated, such as the results observed by Solberg et al. in Norway [41].For the effect of elevation, we used DEM data in representing elevation and also generated slope data based on DEM data.For the effect of surface vegetation cover, we used MCD10C1 land-cover type products based on MODIS data to reflect the surface vegetation cover types.Based on previous research [22], we roughly classified the vegetation cover types as forest, grassland or bareland.

Methods
The framework of this study mainly consists of three parts: data preprocessing, dataset production and accuracy validation (Figure 2).Data preprocessing includes quality control, cloud detection and the construction of fishnet.Quality control is primarily based on QA data from AVH09C1 to obtain valid pixels.Cloud detection will be detailed in Section 3.1.Due to the different resolutions between AVH09C1 and MOD10A1, and the much lower spatial resolution of AVH09C1 compared to MOD10A1, we construct fishnet (vector data) based on AVH09C1 to calculate statistical values for fitting and accuracy evaluation work.Dataset production mainly involves fitting core formulas and comparative analysis, with detailed content presented in Section 3.2.Accuracy analysis relies primarily on MOD10A1, ground observation data and ERA5 data, while also examining the stability of accuracy under different land surface vegetation cover types, elevations and slope conditions.

Cloud Detection
Due to potential issues of significant cloud cover overestimation in AVHRR [21,22], many studies have re-identified clouds, leading us to not directly adopt the cloudy flags.Cloud detection using AVHRR data remains a challenge due to the limited number of spectral bands.Effective cloud detection is crucial as it directly impacts the inversion re-

Cloud Detection
Due to potential issues of significant cloud cover overestimation in AVHRR [21,22], many studies have re-identified clouds, leading us to not directly adopt the cloudy flags.Cloud detection using AVHRR data remains a challenge due to the limited number of spectral bands.Effective cloud detection is crucial as it directly impacts the inversion results of target features, especially since the reflectance of snow in the visible spectrum overlaps with cloud altitudes.Currently, there are two mainstream methods for cloud monitoring: one categorizes clouds into high, medium, low and thin clouds, setting different thresholds for different cloud types [42].Another approach, represented by Hori's work, differentiates cloud monitoring between alpine and non-alpine regions and has achieved global cloud monitoring.Chen et al.'s study in the Tibetan Plateau has confirmed the adaptability of Hori's method in the region [22].Furthermore, Hao et al. have conducted a spectral analysis of cloud characteristics over China based on Hori's method, making adjustments primarily focused on the threshold selection between BT3 and BT4 [23].This study integrates the identification methods of Hori and Hao, presenting the following cloud identification method (as in Table 3).The cloud detector scans from the top to the bottom of the list for each target; a flag switch set to on indicates "cloudy", while off indicates "no cloud".A total of nine variables are used for cloud identification.It is important to note that NDSI AVHRR is a variant of NDSI, using the 3.7 µm infrared band instead of the 1.6 µm near-infrared band.

Inversion Algorithm of AVHRR FSC
We first selected SR1 and SR2 along with NDSI AVHRR for fitting the snow cover percentage.By examining the distribution characteristics of the scatter plots, we performed both linear and logistic model fittings on the results.All pixel-based sampling points are averaged every 10 values to better illustrate the fitting effect.The findings indicate that the logistic model provided a better fit for B1, B2 and NDSIAVHRR.Among these, NDSIAVHRR achieved the best fitting results, with an R-squared and RMSE (Root Mean Square Error) of 0.92 and 0.078, respectively; B2 showed the poorest fit, with an R-squared and RMSE of 0.89 and 0.095, respectively, while B1's results were intermediate.Both in terms of R-squared and RMSE, NDSI AVHRR exhibited the best fitting results, although the outcomes of the three fitting parameters were very close.Ultimately, we selected NDSI AVHRR as the key parameter for the inversion of snow cover percentage.

FSC =
89.13 × e 1.11+7.74×NDSIAV HRR 1 + e 1.11+7.74×NDSIAV HRR (1) The above formula can be regard as the vanilla algorithm.Although it achieves a high fitting effect, it does not consider the influence of the complex state of the ground surface on the snowpack inversion.In order to recognize these influences, we add other factors to improve the algorithm.According to previous studies, the main influences on snow cover inversion are elevation, slope, surface vegetation type and seasonal influence.Among them, the seasonal influence is mainly the difference in inversion models caused by different surface temperatures, and here we use BT4 as a rough characterization of surface temperature, combined with elevation and slope, which are continuous variables, and we use them together with NDSIAVHRR as variables in the logistic model.Multivariate fitting was performed separately for different vegetation cover types.The fitting results showed that the fitting results considering multiple factors were better than Equation (1).Further, we fitted the model for different vegetation cover types.Grassland had the best fit, followed by bare ground and forest, but all arrived at higher R-square, while RMSE was maintained at a low level, and finally, we used Equations (3)-( 5) as the official inverse model (regarded as the improved algorithm).
Comparing the results of Figures 3 and 4, when elevation, slope and BT4 are added as variables, R-squared and RMSE are improved.Figure 4 also shows the fitting results under different vegetation cover types, among which Grass has the widest distribution and the largest number of sampling points on the Tibetan Plateau, and has the best fitting effect, with R-squared and RMSE reaching 0.92 and 0.0745, respectively.Forest is not the main vegetation cover type, but the R-squared and RMSE also reach 0.73 and 0.0745, respectively.Finally, we used Equations (3)-( 5) as the improved algorithms for the AVHRR-FSC dataset.

Accuracies of the AVHRR FSC Product
We compared the FSC inversion results of both the vanilla algorithm and the improved algorithm for six temporal phases between AVHRR and MODIS.As shown in Figure 5, across three test areas on the Tibetan Plateau the trends in snow cover percentage distribution were largely consistent between the two, with areas identified as cloudy in AVHRR also marked as non-snow in corresponding MODIS data, suggesting an alignment in cloud identification outcomes.This indicates the snow detection algorithm's spatial reliability.
Figure 5 presents a comparison between AVHRR-FSC images of the vanilla algorithm and sub-pixel snow mapping images of MOD10A1.The visual comparison highlights differences in snow distribution, suggesting that significant discrepancies would question the sub-pixel snow spectroscopy's reliability.Based on Figure 4, snow distribution from a visual interpretation of AVHRR/2 images corresponds with sub-pixel snow mapping outcomes, further affirming the sub-pixel snow mapping algorithm's validity.
Additionally, we analyzed the standard deviation between AVHRR and MOD10A1 data for these phases, finding the largest standard deviation to be 28.68 and the smallest to be 15.05%.The most significant difference in snow-covered area among the test sites occurred on 14 December 2014, at less than 4%, with all other discrepancies below 3%.These findings, from both spatial distribution and statistical metrics, demonstrate a high congruence between AVHRR and MOD10A1 products.

Accuracies of the AVHRR FSC Product
We compared the FSC inversion results of both the vanilla algorithm and the improved algorithm for six temporal phases between AVHRR and MODIS.As shown in Figure 5, across three test areas on the Tibetan Plateau the trends in snow cover percentage distribution were largely consistent between the two, with areas identified as cloudy in AVHRR also marked as non-snow in corresponding MODIS data, suggesting an alignment in cloud identification outcomes.This indicates the snow detection algorithm's spatial reliability.
Figure 5 presents a comparison between AVHRR-FSC images of the vanilla algorithm and sub-pixel snow mapping images of MOD10A1.The visual comparison highlights differences in snow distribution, suggesting that significant discrepancies would question the sub-pixel snow spectroscopy's reliability.Based on Figure 4, snow distribution from a visual interpretation of AVHRR/2 images corresponds with sub-pixel snow mapping outcomes, further affirming the sub-pixel snow mapping algorithm's validity.
Additionally, we analyzed the standard deviation between AVHRR and MOD10A1 data for these phases, finding the largest standard deviation to be 28.68 and the smallest to be 15.05%.The most significant difference in snow-covered area among the test sites occurred on 14 December 2014, at less than 4%, with all other discrepancies below 3%.These findings, from both spatial distribution and statistical metrics, demonstrate a high congruence between AVHRR and MOD10A1 products.
Although Figure 5 demonstrates that the vanilla algorithm achieves high accuracy, the improved algorithm obtains a better result (Figure 6).For all six datasets, the snow cover area from the improved algorithm is closer to MOD10A1.Taking the time-phase data of 10 December 2009 and 6 December 2004 as examples, the vanilla algorithm recognizes the snow cover area of 8.62% and 4.41%, respectively, while the improved algorithm recognizes snow coverage of 11.26% and 6.04%, respectively.Despite the high RMSE values for both the vanilla and improved algorithms, it is important to note that RMSE values are calculated based on pixel size.This calculation method is over stringent due to the differences in spatial resolution between AVHRR and MODIS data.Even a minor misalignment in geographic coordinates can significantly increase the RMSE value, making it more appropriate as a reference rather than as an absolute measure of accuracy.Nevertheless, the statistical values of snow-covered areas and the results from visual interpretation exhibit a high degree of consistency between AVHRR-FSC data and MOD10A1.Although Figure 5 demonstrates that the vanilla algorithm achieves high accuracy, the improved algorithm obtains a better result (Figure 6).For all six datasets, the snow cover area from the improved algorithm is closer to MOD10A1.Taking the time-phase data of 10 Dec 2009 and 6 Dec 2004 as examples, the vanilla algorithm recognizes the snow cover area of 8.62% and 4.41%, respectively, while the improved algorithm recognizes snow coverage of 11.26% and 6.04%, respectively.Despite the high RMSE values for both the vanilla and improved algorithms, it is important to note that RMSE values are calculated based on pixel size.This calculation method is over stringent due to the differences in spatial resolution between AVHRR and MODIS data.Even a minor misalignment in geographic coordinates can significantly increase the RMSE value, making it more appropriate as a reference rather than as an absolute measure of accuracy.Nevertheless, the statistical values of snow-covered areas and the results from visual interpretation exhibit a high degree of consistency between AVHRR-FSC data and MOD10A1.To provide a more comprehensive comparison of the accuracy between the vanilla algorithm and the improved algorithm, we evaluated their performance under varying conditions of elevation, slope and vegetation cover types.As illustrated in Figure 7, regardless of elevation or slope, the RMSE remained below 30%.Under different elevation and slope conditions, the improved algorithm consistently exhibited lower RMSE values compared to the vanilla algorithm, and its estimates of snow-covered area were more To provide a more comprehensive comparison of the accuracy between the vanilla algorithm and the improved algorithm, we evaluated their performance under varying conditions of elevation, slope and vegetation cover types.As illustrated in Figure 7, regardless of elevation or slope, the RMSE remained below 30%.Under different elevation and slope conditions, the improved algorithm consistently exhibited lower RMSE values compared to the vanilla algorithm, and its estimates of snow-covered area were more closely aligned with MOD10A1 data.
To provide a more comprehensive comparison of the accuracy between the vanilla algorithm and the improved algorithm, we evaluated their performance under varying conditions of elevation, slope and vegetation cover types.As illustrated in Figure 7, regardless of elevation or slope, the RMSE remained below 30%.Under different elevation and slope conditions, the improved algorithm consistently exhibited lower RMSE values compared to the vanilla algorithm, and its estimates of snow-covered area were more closely aligned with MOD10A1 data.
In various elevation intervals, the RMSE values for both the vanilla and improved algorithms were highest at elevations below 2 km and above 6 km, with the lowest values occurring between 2 and 3 km.Regarding slope, the RMSE of the vanilla algorithm initially increased and then decreased, peaking for slopes between 12° and 16°, and reaching its lowest point for slopes between 16° and 20°.In contrast, the RMSE of the improved algorithm remained consistently lower than that of the vanilla algorithm within the 0° to 16° slope range, without significant variation, indicating greater stability.In various elevation intervals, the RMSE values for both the vanilla and improved algorithms were highest at elevations below 2 km and above 6 km, with the lowest values occurring between 2 and 3 km.Regarding slope, the RMSE of the vanilla algorithm initially increased and then decreased, peaking for slopes between 12 • and 16 • , and reaching its lowest point for slopes between 16 • and 20 • .In contrast, the RMSE of the improved algorithm remained consistently lower than that of the vanilla algorithm within the 0 • to 16 • slope range, without significant variation, indicating greater stability.
As shown in Figure 8, the RMSE for different vegetation cover types remains below 20%.The RMSE of the improved algorithm is consistently lower than that of the baseline algorithm, and its snow cover area is closer to that of MOD10A1.For both the baseline and the improved algorithms, bareland exhibits the lowest RMSE, although the differences among the three are minimal.Based on comprehensive analysis from multiple perspectives, the improved algorithm outperforms the baseline algorithm.Therefore, we have adopted the improved algorithm as the core algorithm for the AVHRR-FSC dataset.As shown in Figure 8, the RMSE for different vegetation cover types remains below 20%.The RMSE of the improved algorithm is consistently lower than that of the baseline algorithm, and its snow cover area is closer to that of MOD10A1.For both the baseline and the improved algorithms, bareland exhibits the lowest RMSE, although the differences among the three are minimal.Based on comprehensive analysis from multiple perspectives, the improved algorithm outperforms the baseline algorithm.Therefore, we have adopted the improved algorithm as the core algorithm for the AVHRR-FSC dataset.After completing the creation of the AVHRR-FSC dataset for the years 1982-2020, we used ground observation data and reanalysis data as supplementary validation methods.We employed snow-depth measurement data from 103 sites over 13 years, provided by the China Meteorological Administration (CMA), to validate our new AVHRR snow product.The overview of the validation results, as presented in Table 4, shows that although kappa coefficients were generally low, the recall values were significantly higher.Notably, the peak recall value was recorded in 2008 at 0.90, while the lowest was in 2012, at 0.66, After completing the creation of the AVHRR-FSC dataset for the years 1982-2020, we used ground observation data and reanalysis data as supplementary validation methods.
We employed snow-depth measurement data from 103 sites over 13 years, provided by the China Meteorological Administration (CMA), to validate our new AVHRR snow product.The overview of the validation results, as presented in Table 4, shows that although kappa coefficients were generally low, the recall values were significantly higher.Notably, the peak recall value was recorded in 2008 at 0.90, while the lowest was in 2012, at 0.66, with the recall values for the other years all above 0.73.This indicates that the inversion results demonstrated a high level of stability.The daily data from ERA5 for the year 2012 were used in our study.Due to cloud interference, the two datasets could not be directly compared.Therefore, we only compared the number of snow-covered pixels under cloud-free conditions between the two datasets, and the pixel count was evaluated based on AVHRR-FSC* snow pixels.The results indicate that the temporal variation trends of the two datasets are highly consistent (Figure 9).This indicates that the algorithm in this study has strong stability.It is important to note that both the ground observation data and the ERA5 data are based on observations from meteorological stations.There is a substantial difference in spatial scale between point data and raster data.Therefore, the accuracy assessment of the AVHRR-FSC* dataset needs to be comprehensively evaluated using Figures 6-8, Table 4

Discussion
Through comparison and validation with MODIS and meteorological station data, we have confirmed that AVHRR-derived snow cover percentages exhibit high accuracy.
It is important to note that MODIS and meteorological stations offer different perspectives for data validation.MODIS data is primarily used to assess the spatial consistency of snow inversion results.However, since the MOD10A1 does not contain cloud identification information [14], we did not directly evaluate the accuracy of cloud identification.The ability to correctly exclude cloud interference may represent one of the main challenges for

Discussion
Through comparison and validation with MODIS and meteorological station data, we have confirmed that AVHRR-derived snow cover percentages exhibit high accuracy.It is important to note that MODIS and meteorological stations offer different perspectives for data validation.MODIS data is primarily used to assess the spatial consistency of snow inversion results.However, since the MOD10A1 does not contain cloud identification information [14], we did not directly evaluate the accuracy of cloud identification.The ability to correctly exclude cloud interference may represent one of the main challenges for AVHRR in snow inversion.In the example from 10 December 2009, in Figure 6, we observe that MODIS identified parts of the western Nyingchi Tanggula mountain as snow-free, whereas AVHRR results indicated significant cloud interference in this area, with more consistent identification results in other areas.
While many studies use meteorological station data to validate the accuracy of remotely sensed snow products [43][44][45], meteorological station snow-depth measurements are theoretically not ideal for validation purposes.This is because the stations provide point data, whereas AVHRR has a spatial resolution of 0.05 • .Within a grid, the position of the station and the actual distribution of snow cover can greatly affect the accuracy assessment.However, the recall metric is very important for data from meteorological stations.This is because, if a station measures a certain snow depth, it confirms the presence of snow cover in that area, implying the AVHRR-detected snow cover percentage should be greater than 0. However, a snow depth of 0 at a station does not mean that there is no snow cover within the entire 0.05 • × 0.05 • grid of the station's location.Therefore, for accuracy assessments based on meteorological stations, the significance of Recall is greater than that of Precision, Kappa and other evaluation metrics.In this study, the Recall values of the AVHRR snow cover product relative to meteorological station snow-depth data are at a high level, indicating that our algorithm has a high rate of snow cover detection.
Metsämäki et al., based on AVHRR and observational data, performed linear interpolation between reference reflectances under full snow cover conditions (SCA = 100%) and no snow conditions (SCA = 0%), different from the logistic model-based interpolation method used in our study, to estimate the Snow Cover Area (SCA) [46].When validated against in situ SCA measurements from snow tracks, the overall accuracy was 85%, and when validated against snow measurements from meteorological stations, the accuracy was 64.4%.With the overall accuracy above 99% in our study (calculated from Figure 6), our sub-pixel snow mapping algorithm should be considered reliable and feasible.
The most significant finding of our study is the determination that the percentage of snow cover exhibits a logistic curve relationship with NDSI AVHRR , whereas previous research results were often linear, such as when Hall et al. used a linear equation to simulate snow cover percentages with NDSI for MODIS data inversion.Additionally, spectral mixture models also differ from logistic models, which might relate to the data processing workflow and the selection of validation datasets.Nevertheless, this study has confirmed that the logistic model based on NDSI AVHRR achieves high accuracy.
It is worth mentioning that Hüsler's algorithm for inverting snow cover probability also utilized a logistic model and validated it with on-site data captured by network cameras.However, their logistic model was primarily used for calculating snow cover probability [19].Logistic regression is more commonly applied in classification problems, and we employed it only because the fitting curve more accurately matched the characteristics of the scatter distribution.Currently, it is unclear whether their findings are in any way related to our research, which remains an area for future investigation.
The accuracy of snow cover detection algorithms is associated with various factors, such as elevation, surface roughness, etc. [47][48][49].This study systematically analyzes, by distinguishing between the vanilla and improved algorithms, the effects of vegetation cover type and topographic elements on the accuracy of FSC inversion.The importance of considering these factors in the algorithm is demonstrated.In this study, slope is used as a proxy for surface roughness.Previous research has shown that elevation significantly influences accuracy, with snow detection precision tending to decrease as elevation increases.Despite the reduced snow detection accuracy at elevations above 6000 m in this study, this phenomenon is attributed to the unique geographical environment of the high mountain regions in Asia and the selection of validation images by date.Due to the high elevations in the high mountain regions of Asia, areas above 5000 m are predominantly covered by grasslands, leading to severe fragmentation of snow cover.This finding is consistent with previous research for elevations significantly below the 2000 to 6000 m range [49,50].However, there was no significant decrease in accuracy within the 2000 to 6000 m range, and accuracy was lower at elevations below 2000 m.In the Tibetan Plateau region, areas below 2000 m, such as the Qaidam Basin and valleys on the southern edge of the plateau transitioning to the high plateau, experience climates transitioning from subtropical arid to subtropical forest climates to high plateau climates.Therefore, snow fragmentation is particularly severe in these areas.Additionally, it is important to note that the validation images selected were from the winter season when snow cover is more extensive, resulting in more concentrated snow cover in these areas and higher validation accuracy.
Due to cloud interference, the dataset created in this study cannot yet be directly applied to global change research [51].Future work could involve the creation of longterm, cloud-free datasets through the fusion of multi-temporal and multi-source data.It is important to note that current long-term, cloud-free snow cover datasets are based on a binary snow/no-snow classification, utilizing multi-temporal fusion to mitigate some cloud effects and further integrating with passive microwave data to completely eliminate cloud influences [52][53][54].However, for snow cover percentage data, while multi-temporal fusion allows for averaging pixel values, establishing a conversion relationship between snowdepth measurements and snow cover percentages when fusing with passive microwave data remains a technical challenge and a potential direction for future research.
In this study, we used MODIS snow products as training samples instead of selecting higher-resolution TM data, as many studies did, which could be seen as contentious.It is worth mentioning that our choice of the MODIS dataset was driven by two main reasons.Firstly, the MODIS snow product has undergone numerous iterations and updates, and its accuracy has been validated across many global regions, offering strong reliability [44,55,56].Secondly, using MODIS as a training sample allows for easier balancing of RMSE to ensure consistency between AVHRR and MODIS inversion results, laying the groundwork for the next step of creating a long-term AVHRR-MODIS-VIIRS dataset.In the future, we plan to reference the NDVI dataset to create a global long-term FSC data product starting from 1981 and spanning over 40 years.

Conclusions
This study developed a snow cover dataset for the Tibetan Plateau region based on the long-term AVHRR dataset.Using MODIS data as ground truth, we fitted Bands 1, 2 and NDSI AVHRR , and the results indicated that Band 1's logistic curve fitting was the most optimal, achieving an R-squared of 0.9 for the training dataset and an RMSE of less than 0.5 for the prediction dataset.Furthermore, we analyzed the influence of factors such as surface vegetation cover type, elevation and slope on the algorithm by distinguishing between the vanilla algorithm and the improved algorithm.We assessed the importance of these factors and ultimately adopted the improved algorithm as the core algorithm for the AVHRR-FSC dataset.Through spatial comparison analysis with the MODIS dataset, we observed that the spatial distribution of AVHRR snow cover inversion results was consistent with those from MOD10A1, with a standard deviation of less than 0.2 for each spatial pixel.Considering that terrain might impact inversion accuracy, we analyzed accuracy under different elevation conditions, revealing that consistency between AVHRR and MOD10A1 inversion results increased with elevation.Comparison analysis with actual meteorological station data showed that, except for a Recall value of 0.66 in 2012, all other values were greater than 0.73, indicating that most snow-covered areas were detected.Additionally, the results also show a high degree of consistency with ERA5.This study demonstrates the feasibility of AVHRR inversion for snow cover percentage and provides the optimal fitting curve.The operational snow inversion technique designed in this study lays the foundation for future global AVHRR snow cover percentage product development.

Figure 1 .
Figure 1.Study area and test areas for regression.

Figure 1 .
Figure 1.Study area and test areas for regression.

18 Figure 2 .
Figure 2. Flowchart of AVHRR-FSC generation and accuracy analysis in this study.

Figure 2 .
Figure 2. Flowchart of AVHRR-FSC generation and accuracy analysis in this study.

Figure 3 .
Figure 3. Linear fitting of SR1, SR2 and NDSIAVHRR with MOD10A1 for training data.The blue represents the fit line, and the dash line represents the perfect prediction line.(a) the linear fitt of SR1 with MOD10A1; (b) the linear fitting of SR2 with MOD10A1; (c) the linear fitting of NDSIAVHRR with MOD10A1.

Figure 3 .
Figure 3. Linear fitting of SR1, SR2 and NDSI AVHRR with MOD10A1 for training data.The blue line represents the fit line, and the dash line represents the perfect prediction line.(a) the linear fitting of SR1 with MOD10A1; (b) the linear fitting of SR2 with MOD10A1; (c) the linear fitting of NDSI AVHRR with MOD10A1.

Figure 4 .
Figure 4. Scatter density plots for Logistic fitting results after considering elevation, slop, BT4 and comparison of predicted values with MOD10A1.The four subplots are related to uncategorized, grass, forest and bareland for vegetation cover types.The dash line represents the perfect prediction line.

Figure 4 .
Figure 4. Scatter density plots for Logistic fitting results after considering elevation, slop, BT4 and comparison of predicted values with MOD10A1.The four subplots are related to uncategorized, grass, forest and bareland for vegetation cover types.The dash line represents the perfect prediction line.

18 Figure 5 .
Figure 5.Comparison of AVHRR_FSC (the vanilla algorithm) and MODIS_FSC mapping; AVHRR_FSC and MODIS_FSC stand for total snow cover area, and RMSE is pixel-based calculated. 1, 2, and 3 correspond to the three test areas.

Figure 5 . 18 Figure 6 .
Figure 5.Comparison of AVHRR_FSC (the vanilla algorithm) and MODIS_FSC mapping; AVHRR_FSC and MODIS_FSC stand for total snow cover area, and RMSE is pixel-based calculated. 1, 2, and 3 correspond to the three test areas.Remote Sens. 2024, 16, x FOR PEER REVIEW 12 of 18

Figure 6 .
Figure 6.Comparison of AVHRR_FSC* (the improved algorithm) and MODIS_FSC mapping; AVHRR_FSC* and MODIS_FSC stand for total snow cover area, and RMSE is pixel-based calculated. 1, 2, and 3 correspond to the three test areas.

Figure 7 .
Figure 7. Trends in RMSE of AVHRR FSC inversion results with elevation and slope variation, using MOD10A1 as the ground truth.RMSE and AVHRR_FSC stand for the vanilla algorithm, RMSE* and AVHRR_FSC* stand for the improved algorithm.(a) trends in RMSE of AVHRR FSC inversion results with elevation; (b) Trends in RMSE of AVHRR FSC inversion results with slop.

Figure 7 .
Figure 7. Trends in RMSE of AVHRR FSC inversion results with elevation and slope variation, using MOD10A1 as the ground truth.RMSE and AVHRR_FSC stand for the vanilla algorithm, RMSE* and AVHRR_FSC* stand for the improved algorithm.(a) trends in RMSE of AVHRR FSC inversion results with elevation; (b) Trends in RMSE of AVHRR FSC inversion results with slop.

Figure 8 .
Figure 8. Trends in RMSE of AVHRR FSC inversion results with different vegetation types, using MOD10A1 as the ground truth.RMSE and AVHRR_FSC stand for the vanilla algorithm, RMSE* and AVHRR_FSC* stand for improved algorithm.

Figure 8 .
Figure 8. Trends in RMSE of AVHRR FSC inversion results with different vegetation types, using MOD10A1 as the ground truth.RMSE and AVHRR_FSC stand for the vanilla algorithm, RMSE* and AVHRR_FSC* stand for improved algorithm.

Figure 9 .
Figure 9. Snow cover pixel count of AVHRR-FSC* compared to ERA5.AVHRR-FSC* is the result of the improved algorithm.(a) shows the comparison of daily snow cover pixel count; (b) shows the consistency between the two datasets.

Figure 9 .
Figure 9. Snow cover pixel count of AVHRR-FSC* compared to ERA5.AVHRR-FSC* is the result of the improved algorithm.(a) shows the comparison of daily snow cover pixel count; (b) shows the consistency between the two datasets.

Table 1 .
The band information of AVH09C1.

Table 3 .
Cloud detection tests and corresponding thresholds.

Table 4 .
The validation results of the AVHRR_FSC* based on ground snow-depth measurements.
and Figure 9.