Precipitation Retrieval over the Tibetan Plateau from the Geostationary Orbit—Part 1: Precipitation Area Delineation with Elektro-L2 and Insat-3D

The lack of long term and well distributed precipitation observations on the Tibetan Plateau (TiP) with its complex terrain raises the need for other sources of precipitation data for this area. Satellite-based precipitation retrievals can fill those data gaps. Before precipitation rates can be retrieved from satellite imagery, the precipitating area needs to be classified properly. Here, we present a feasibility study of a precipitation area delineation scheme for the TiP based on multispectral data with data fusion from the geostationary orbit (GEO, Insat-3D and Elektro-L2) and a machine learning approach (Random Forest, RF). The GEO data are used as predictors for the RF model, extensively validated by independent GPM (Global Precipitation Measurement Mission) IMERG (Integrated Multi-satellitE Retrievals for GPM) gauge calibrated microwave (MW) best-quality precipitation estimates. To improve the RF model performance, we tested different optimization schemes. Here, we find that (1) using more precipitating pixels and reducing the amount of non-precipitating pixels during training greatly improved the classification results. The accuracy of the precipitation area delineation also benefits from (2) changing the temporal resolution into smaller segments. We particularly compared our results to the Infrared (IR) only precipitation product from GPM IMERG and found a markedly improved performance of the new multispectral product (Heidke Skill Score (HSS) of 0.19 (IR only) compared to 0.57 (new multispectral product)). Other studies with a precipitation area delineation obtained a probability of detection (POD) of 0.61, whereas our POD is comparable, with 0.56 on average. The new multispectral product performs best (worse) for precipitation rates above the 90th percentile (below the 10th percentile). Our results point to a clear strategy to improve the IMERG product in the absence of MW radiances.


Introduction
The Tibetan Plateau (TiP) is known to be crucial (1) for the global atmospheric circulation with special reference to the east Asian monsoon system and (2) for humans who depend on the water supply by rivers originating in the high mountains of the TiP [1]. Particularly for (2), the exact knowledge on the spatial extent and temporal development of precipitation is of utmost importance. Unfortunately, precipitation on the TiP is not well observed. The high elevated plateau (ca. 4500 m a.s.l. on average) with its complex terrain challenges the installation and maintenance of climate stations and weather radars [2]. On the monthly and annual scale, many attempts have been made over the last few decades to extrapolate the rain gauge networks with a coarse spatial resolution to produce In summary, there is still a large amount of uncertainty in precipitation retrieval even in the newest generation products due to various reasons. One main problem seems to be the still insufficient delineation of the precipitating areas in different regimes (dry, moist areas) and topographies, combined with a very restricted use of GEO IR data in the hybrid products.
Thus, we generally focus on an improvement of precipitation retrieval by (i) the best GEO selection for the TiP, (ii) combining the full spectral range of second generation GEO systems with (iii) the powerful tools of machine learning technology (Random Forest, RF). In this paper, we develop a precipitation methodology to delineate precipitating areas using a combination of Insat-3D and Elektro-L2. We therefore present various tests to select the best model to delineate precipitating areas on the TiP. RF models are trained by the high quality, gauge calibrated MW estimates from IMERG [28,31]. The classification results are validated against independent GPM IMERG data not used for model training.

Data and Methods
This section gives an overview of the data used in this study. Then we describe the processing scheme of the precipitation area delineation. Furthermore, we provide information on the RF modeling in general and on the concept of our modeling process. The study area comprises the region from 25 • N-45 • N and 65 • E-105 • E to cover the whole TiP, and we restrict our analyses only to areas which belong to the TiP, located above 2500 m to exclude areas where the formation of precipitation differs from the formation of precipitation on the TiP.

Data
The data used for optimizing the precipitation area delineation are described in the following subsections. We chose the GEO satellites Insat-3D and Elektro-L2 due to their spatial resolution of 4 km and their central view on the TiP. The study period ranges from 5 March 2017 to 4 October 2017. Figure 1 describes the data amount on a temporal and spatial scale which is available for training and validation. The rainy season is June, July and August. The other months do not receive much precipitation [8,40]. Since March, April and October lack lots of data, we decided to focus on May, June, July, August and September, where much more data are available. The spatial data availability is overall evenly distributed on the TiP; however, there are some areas which are frequently masked out due to IMERG's quality index. These areas are mostly covered with ice and snow.

Geostationary Satellite Products from Insat-3D and Elektro-L2
Detailed information about the satellite data from Insat-3D and Elektro-L2 can be found in Tables 1  and 2. The sun elevation angle from Insat-3D was used to calculate the sun azimuth angle.
The spatiotemporal resolution of the cloud mask provided by the Indian Space Research Organization MOSDAC is 4 km and 30 min. The classification of the cloud mask uses radiances from the 3.9 µm, 10.8 µm and 11.9 µm bands [41].

Mar
Apr   The GPM core satellite was launched in February 2014 and covers the area between 60 • N-60 • S. The GPM Core Observatory satellite carries a dual-frequency precipitation radar (DPR) and a 13-channel passive microwave (PMW) imager (GMI) [32,45].
GPM offers several data levels (Level 1-3). Level 3 data are gridded, accumulated multi-satellite merged data sets such as IMERG where the GMI is the calibrator among various sensors from other low earth orbiting (LEO) satellites.
GPM IMERG is available for three different latency periods depending on the user's requirements of application. An early (6 h, real-time applications), late (12 h, e.g., crop forecasting) and final run (4 months, research purposes) can be assessed [28,32]. The final run recommended for research purposes is used in this study [47].

Additional Data
A digital elevation model (DEM) from the Global 30 Arc-Second Elevation Data Set is used [48]. It is provided in a EPSG: 4326 projection (WGS 84) covering the whole globe. The data were adjusted to the study area of the TiP (25 • N-45 • N, 65 • E-105 • E), and the same spatial resolution as GPM IMERG of 0.1 • was assigned using the nearest neighbor interpolation. Different topographical indices were calculated like the topographic position index (TPI) [49], the terrain ruggedness index (TRI) [50] and slope and aspect using the Grass module r.slope.aspect [51].
Furthermore, geostatistical texture features were computed. They are used as predictors for the modeling. Geostatistical texture features characterize textures based on the relationship between the similarity and distance of neighboring pixels. The IR bands from Elektro-L2 and Insat-3D served as input for these calculations. The geostatistical texture features might be helpful to distinguish different cloud types. We use the variogram (VAR), madogram (MAD), rodogram (ROD), cross variogram (CV), and pseudo cross variogram (PCV), which were used in similar studies for the derivation of atmospheric properties [52]. In addition, we calculated the band differences for each possible combination. All predictors are listed in Table 3 (bands, band differences, geostatistical texture features and ancillary data). ∆ T 9.7-3.9 Aspect ∆ T 9.7-6.8 Tangential curvature ∆ T 9.7-8 Profile curvature ∆ T 10. 8-6.8 Satellite Elevation Angle ∆ T 10. 8-11.9 Satellite Azimuth Angle ∆ T 10.8-3.9 Partly static ∆ T 10.8-8 Solar Zenith Angle ∆ T 10. 8-9.7 Sun Azimuth Angle

Processing Scheme of the Precipitation Area Delineation
We aim to identify precipitating areas (areas with and without precipitation) on the TiP based on an RF machine learning approach using multispectral IR data from GEO satellites and (partly) static variables as predictors ( Figure 2). In a first step, we preprocessed the satellite data to get a uniform data set regarding spatio-temporal resolution and data quality for the modeling. To this end, the GEO data were first projected to a common reference system (EPSG: 4326; WGS84). The satellite data and GPM IMERG were cropped to the research area (25 • N-45 • N, 65 • E-105 • E). The 4 km and 8 km resolutions of the GEO satellite data were resized to match the IMERG resolution of 0.1 • using nearest neighbor interpolation. Since all data are available at 30 min resolution, no temporal aggregation was performed.
We employ a combination of Insat-3D and Elektro-L2 because, on the one hand, Elektro-L2 contains a variety of useful bands; however, these data are not operationally available. On the other hand, Insat-3D is operationally available but lacks bands that are useful for cloud phase detection. A combination of both satellites overcomes these issues and allows for a precipitation area delineation for the common period (see Table 1 for details).
Not all the data were chosen for the modeling; therefore, data were masked out that (1) were not assigned a "Quality Index for precipitationCal field" (precipitationQualityIndex) > 0.9 in order to use only accurate "Multi-satellite precipitation estimate with gauge calibration" (precipitationCal) that is based on MW based rainfall information [32,46,53], (2) were not assigned as cloudy in the Insat-3D cloud mask and (3) where the difference between the Insat-3D scanning time differed more than 7 min from the "Microwave satellite observation time" (HQobservationTime) of GPM IMERG to account for close timing between the satellite overflight and the precipitation events observed in IMERG. Since no visible (VIS) and near infrared (NIR) bands are used, no differentiation between daytime, nighttime and twilight is applied. We used the "Multi-satellite precipitation estimate with gauge calibration" (precipitationCal) from the final run of the IMERG product which is a combined product of different satellite sensors. We used the "Quality Index for precipitationCal field" to extract only the most reliable MW based rainfall information from the precipitationCal data [53]. In the next step, we used the information of the "Microwave satellite observation time" (HQobservationTime) to calculate the time difference between the Insat-3D scanning time and the time of overflight of the respective MW sensors.
The data set consists of (1) Insat-3D bands, the cloud mask, the scan time, sun and satellite azimuth and elevation, (2) Elektro-L2 bands, and (3) the IMERG data that contains the MW gauge calibrated precipitation, IR only precipitation, observation time, observation source, and quality index. All of the data also contain latitude and longitude.
The preprocessed data set was then split into two subsets for training (80%) and validation (20%) for each scene. The training data are used for the feature selection, model parameter tuning and training described in the following. The feature selection and model parameter tuning are performed for each month, whereas the training was performed for each balanced data set (see Figure 3 for details) and temporal resolution separately. The feature selection results in a subset of most relevant features out of the initial feature space. The RF models are trained using the training data set and are then applied to the validation data. The resulting precipitating area is validated against an IMERG's gauge calibrated MW precipitating area that was not used for training.   A major part of RF model training is to find the best performing procedure for modeling the precipitating area ( Figure 3). First, we aim to find the best temporal resolution for model training which might be monthly, weekly, daily or scene based. In addition, we test whether we need to change the balance of the precipitating (which are rare) and non-precipitating pixels (which are usually frequent). Instead of training the model with all cloudy pixels, which include an enormous amount of non-precipitating pixels, we test the model performance with different training sets: all precipitation pixels and (i) the same (ratio of 1:1), (ii) double (ratio1:2) or (iii) triple (ratio of 1:3) amount of cloud-covered, non-precipitation pixels for training. This procedure should help the individual models to learn how to differentiate between cloud areas with and without precipitation. In a last optimizing step, we add static variables as predictors (see Table 3 for details) to the model process. Therefore, we perform another feature selection and a new model parameter tuning, which both contains the static predictors and trains the model using the best temporal resolution and the best ratio for the precipitating pixels.

General Concept of Random Forests
The RF technique, which was first proposed by Breiman [54], is an ensemble decision tree algorithm for classification and regression and is constantly used for various studies on meteorological remote sensing [15,16,20,52,55]. The algorithm is able to fit numerous classification and regression trees to data and to combine the model predictions according to the trees in the forest. Bootstrap samples from 2 / 3 of the training data set with replacement are selected and, for each bootstrap, subsets are used to grow decision trees. A set of trees are more accurate and reliable compared to a single tree. Each tree votes for one specific class. The final decision is made by choosing the class with the highest amount of votes. The out-of-bag (OOB) error is calculated based on the 2 / 3 of the training data. The RF model is applied to the remaining 1 / 3 of the OOB data. The OOB score is used to classify the error rate and is the measure for the internal validation of the RF model [15]. The calculation of the OOB score is explained in detail in Breiman [56]. The RF model also provides feature importances which are useful to identify the most important predictors. The RF models are prediction algorithms and can efficiently run on big data. The RF models map nonlinear relationships well [54]. The Python Scikit-learn package was used to implement the RF method for the current study [57].

Random Forest Classification
IMERG's gauge calibrated MW precipitation serves as the training reference for RF modeling. In a first step, a recursive backward feature elimination was applied to reduce the number of predictors and to find the best performing feature subset for the classification to improve the performance of the RF model [58]. About 70 predictors are initially used for the feature selection.
An RF classification was used for the feature selection for each month by aggregating all training scenes. From this data set, 10% are randomly selected, whereas we set the condition that 20% of the randomly selected data must be rainy pixels and the RF classification is run 50 times to account for stable results. The OOB score and the feature importance resulting from the classification are saved. The least important feature is deleted from the feature list and a new model is fit using the reduced feature space. The procedure of deleting the least important feature and saving the feature importances is repeated until only one feature is left in the classification. Finally, a mean feature importance for each feature was calculated and the predictors were ranked based on the order of their elimination.
The feature selection which precisely determines the predictors relevant for the detection of precipitating areas was performed on a monthly scale to account for seasonality. In the end, 31 predictors on average for the whole data set remain which are further used as predictor variables in the tuning and training of the RF model (see Figure 4). Sun Azimut Angle -T 6.8 -11.9 Solar Zenit Angle 3.9 µm PCV(11.9, 6.8) ∆ T 10.8 -6.8 11.9 µm PCV(10.8, 6.8) In order to find the optimal number of trees in the forest and the optimal number of features to consider when looking for the best split, these parameters of the RF models were tuned. The tuning was performed by randomly selecting 10% of the training data set, whereas 20% of the randomly selected data needed to be rainy pixels and repeating the model tuning 50 times for both model parameters to ensure stable results. The number of features for the best split that were tested range from 2 to 10 by 2 (2 ,4, 6, 8, 10) and the number of trees in the forest range from 100 to 400 by 30 (100, 130, 160, 190, 220, 250, 280, 310, 340, 370, 400). In order to save calculation time and power, tuning was performed (1) on a monthly scale and (2) with fixed ranges of the model tuning parameters. The number of features for the best split varies between 4 and 10 and the number of trees varies between 370 and 400.
The final RF classification models are trained using the optimal predictors and the optimal tuning parameters derived above. As training data a random subset of 80% of the data of each scene are used. The training subset is comprised by the remaining 20%. Please note that only data which fulfilled the conditions are used for training and validation (see Section 2.2 and Figure 2).
We first test all different temporal resolutions with all different versions of the balanced data set regarding the ratio of precipitating and non-precipitating pixels. For all of these tests, we use the results from the feature selection and model parameter tuning. The best temporal resolution with the best version of the balanced data set is used to test whether static predictors have an effect on the quality of the modeling. Therefore, another feature selection and model tuning which comprise the static predictors is performed (see Figure 2 and Table 3).

Validation of the Random Forest Models
The RF models were validated against IMERG's gauge calibrated MW precipitation from which the categorical scores (TP, TN, FN, FP; for abbreviations see Table 4) are obtained. The following validation measures are used to evaluate the model performance: POD, POFD, FAR, HSS and PC (for abbreviations see Table 5). The POD describes the proportion of correctly classified rainy pixels, whereas the POFD illustrates the percentage of non-precipitating pixels which were incorrectly classified as rainy by the model. The FAR describes the proportion of incorrectly rainy classified pixels. The HSS evaluates the overall performance of the model. The PC accounts for the proportion between correctly classified rainy and non-rainy pixels with regard to all pixels (correctly and not correctly classified pixels). Since we aim to improve the IR only precipitation from GPM IMERG, we compare our classification results based on multispectral data to the IR only precipitation product. Table 4 gives an overview of the confusion matrix which is used for the calculation of the validation measures (see Table 5).

Selection of the Best Model
This chapter summarizes the tests which were performed to select the best model for the precipitation area delineation with regard to the validation measures. To do so, we chose eight days from each available month (May, June, July, August, September) to test the differences of the models. These test days from each month were selected according to observed precipitation events in the IMERG data.
It is known that the TiP is a semi-arid region where 60-70% of the precipitation on the TiP occur in the summer months (June, July, August) [8,40]. The model therefore faces challenges in learning to differentiate between precipitating and non-precipitating data due to the low amount of rainy pixels. We assume that selecting completely random data for training leads to an underrepresentation of the rainy class. Reducing the amount of non-precipitating pixels compared to rainy pixels is known to be helpful for the model to better capture the precipitating areas (refer to [21,61]). We tested the (i) unbalanced, (ii) same (ratio of 1:1), (iii) double (ratio1:2) or (iiii) triple (ratio of 1:3) amount of pixels for training.
It is important to account for the seasonality of precipitation since the rainfall amount in May differs a lot from the amount in July where precipitation is influenced by the monsoon. Therefore, we tested the temporal resolution of the RF models on a monthly, weekly, daily and scene based resolution. Monthly or weekly trained models might be useful to gain more rainy data and therefore helping the model to learn when precipitation occurs which we expect to be most helpful in rather dry months like May or September. Daily or scene based trained models might be more precise than weekly or monthly trained models because they only capture the precipitation signals from a specific day or scene.
The relation between terrain and precipitation is well known [13,14]. We assume to enhance our precipitation area delineation by adding static predictors to the RF training (see Table 3 for details). For example, Yamamoto et al. [62] and Kidd et al. [63] showed that altitude has an impact on the performance of rainfall detection in Nepal. Their results prove that static predictors enhance the detection of rainfall areas. In addition, other studies found the use of static predictors like the DEM helpful [61]. For this purpose, we performed a separate feature selection on a monthly basis with the best temporal resolution to include the static predictors (Table 3).

Comparison to IMERG's IR Only Precipitation Estimate
We used six multispectral bands from two GEO satellites with reference to the gauge calibrated MW precipitation estimates from IMERG to detect the precipitating areas on the TiP. The results of the presented precipitation area delineation were compared with the IR only precipitation product from IMERG which only uses the 10.8 µm band for precipitation retrieval. We assume that the use of multispectral satellite data outperforms IMERG's single band precipitation product.

Results of the Selection of the Best Model
The feature selection and tuning were performed on a monthly basis, whereas the training was performed on a monthly, weekly, daily and scene based resolution. All of these temporal variations were also tested using a different amount of non-rainy pixels (unbalanced, ratio of 1:1, ratio1:2 and ratio of 1:3).
The results from the model training on a monthly and unbalanced selected pixel basis revealed the following model performance. The POD was very high ranging from 0.76-0.83 for the five test weeks and the POFD was very low at around 0.08. However, the FAR was very high (0.76 on average). The HSS was especially low in the less rainy seasons like May and September with values 0.15-0.25 and higher in the rainy months like June, July and August (0.37 on average). The PC was overall very high at 0.93 since most rainy pixels are recognized correctly by the model.
The POD was for all temporal resolutions stable and ranged from 0.73-0.87 with an unbalanced choice of pixels regarding all test weeks. When using the ratio of 1:1, the POD dropped for all temporal resolutions because this ratio highly overestimates the true distribution of rainfall data. The POD ranged between 0.22-0.41. We tested the ratio1:2 which resulted in a higher POD compared the ratio of 1:1 and a lower POD than the unbalanced choice (0.33-0.52). Training the RF models with a ratio of 1:3 led to a moderate POD of 0.40-0.61 for all temporal resolutions. The POFD was overall low for all tests ranging between 0.01-0.08.
The FAR decreased in all test weeks with increasing temporal resolution (e.g., from a monthly to weekly resolution) using the unbalanced option when training the RF model. Using the unbalanced choice of pixels, however, led to enormous high FAR up to 0.9. The FAR dropped rapidly when the models were trained using the ratio of 1:1. Modeling with the ratio of 1:1 led to an improvement of the FAR on all time scales which could be reduced down to 0.11. The FAR increased for all temporal scales when the models were trained using the ratio1:2 and ratio of 1:3. However, the POD in both cases was higher compared to the POD of the ratio of 1:1 at all temporal scales.
The HSS displays the overall performance of the model. The HSS increased when the temporal resolution decreased. This is due to the decreasing FAR with decreasing temporal resolution. The highest HSS was found when training the model on a scene basis. Regarding the amount of rainy pixels for training, we find that the HSS was highest when trained with the ratio of 1:3.
The PC was lowest for all temporal resolutions when the model is trained with a ratio of 1:1 (0.79-0.86). It increased for all test weeks with a ratio of 1:2 and reached very good results when the model is either trained unbalanced or with a ratio of 1:3 (up to 0.93). The PC in the unbalanced ratio was very high due to the many correctly negative classified pixels.
To summarize, we find that training the models on a scene basis together with the ratio of 1:3 performs best. This means that we choose all rainy pixels and the triple amount of non-rainy pixels for each scene separately. We also find that the results improve with increasing temporal resolution (see Figure 5). Although other studies improved their model performance by the use of static predictors, we found that there is hardly any difference when training the model using static predictors on a scene basis with a ratio of 1:3 (see Figure 2 and Table 3). In order to test the effect of static predictors, we performed a separate feature selection and a separate RF parameter tuning on a monthly basis (see Table 6 for the comparison).  Figure 6 displays the validation scores for the precipitation area delineation for 5-12 July 2017. There are some differences in the POD, FAR and HSS, but hardly any differences in the POFD and PC. The HSS is highest when the POD is high and the FAR low.  Figure 7 states that low precipitation rates (<10th percentile) are prone to errors with a relatively low POD and a high FAR, whereas high precipitation rates (>90th percentile) display a very POD (ca. 0.9), and a low FAR (ca. 0.2). The POD and HSS increase with increasing precipitation rates, whereas the FAR decreases with increasing precipitation rates.
A reason for that might be that clouds with low precipitation amounts do not differ much from non-precipitating clouds in terms of their top temperatures in the different channels. By contrast, the top cloud temperatures from clouds that lead to high amounts of precipitation can be distinguished more easily from the non-precipitating clouds because of the significant temperature differences.

Results of the Comparison between the New Precipitation Area Delineation and IMERG's IR Only Precipitation
The best model was compared to the IR only product which uses only one spectral band for precipitation retrieval (see Table 7). Compared to IMERG's IR only, the new precipitation delineation based on Elektro-L2 and Insat-3D displayed a higher POD (0.4-0.59) than IR only (0.24 on average) which highly underestimates precipitation events. The POFD of IR only is comparably low. The FAR is extremely high (between 0.43-0.99) compared to the new precipitation delineation. The HSS and partly the PC of IR only produce lower values (HSS at 0.19 and PC at 0.84 on average for all months), which stresses the improvement of the new multispectral delineation.  Figure 8 shows an example of a scene from 7 July 2017 at 4:00 p.m. UTC. Grey represents the MW swath where the gauge calibrated MW precipitation is available and no clouds occur. The white areas are masked out due to a low quality index below 0.9 which indicates that these areas could not be filled with gauge calibrated MW precipitation and therefore are probably snow or ice covered. The white areas cannot be used for training and validation. The colours yellow (TN), red (FN), blue (FP) and green (TP) indicate the verification measures. The left graphic depicts the performance of the new precipitation area delineation based on the gauge calibrated MW precipitation from IMERG.
The graphic on the right shows the differences in the IR only and gauge calibrated MW precipitation product for 7 July 2017 at 4:00 p.m. UTC on the TiP as an example. It indicates that IR only does not capture the precipitating areas which occur in the gauge calibrated MW precipitation product (FN) and also produces some False Positives (FP) where no precipitation occurred.
Our findings show that our model performs better than IR only precipitation compared to IMERG's gauge calibrated MW precipitation.   Figure 9 distinguishes the modeling performance of IR only in low (<10th percentile), medium (between > 10th and < 90th percentile) and high (>90th percentile) precipitation rates. It reveals that high precipitation rates are easier to detect compared to low precipitation rates by IR only. The POFD is comparably low and the FAR decreases with high precipitation rates. The HSS increases with the increase in precipitation. The PC decreases in comparison from high to low precipitation rates. . Performance of the precipitation area delineation using IR only precipitation for July 2017 as boxplots for low, medium and high precipitation rates according to percentiles. The boxes display the 25th, 50th and 75th percentiles. Whiskers indicate extreme data up to 1.5 times of the interquartile range. Outliers are marked as crosses. The width of the boxes is relative to the available number of validation scenes.

Discussion
In this study we present the first rain area delineation on the TiP based on GEO satellite data which are fully validated across space and time.
Due to the missing IMERG independent gauge and weather radar network on the TiP, the variety of validation data are restricted. We therefore use the gauge calibrated MW precipitation from IMERG which was excluded from the feature selection, tuning and training processes to gain independent precipitation data.
Before modeling, we defined conditions for the selection of the data (see Section 2.3 RF model training and validation). When checking the data, we found pixels in the gauge calibrated MW precipitation of IMERG which are flagged as rainy but are not marked as cloudy in the Insat-3D cloud mask. Since they do not fulfill the condition of being cloudy in the cloud mask, they are excluded from training and validation. Some of these pixels were classified as probably cloudy in the cloud mask of Insat-3D. However, we decided to use only those pixels which were clearly classified as cloudy.
We also note that many pixels in all seasons, but especially in May and September, are frequently masked out due to IMERG's quality index, which, in these cases, is below 0.9. The snow and ice covered areas are not used for precipitation retrieval. The IMERG algorithm adds IR information when MW is not able to produce reliable estimates due to the high reflectance of snow and ice. The IR reduces the quality of the precipitation estimates, which are then excluded from training and validation.
Due to computational effort, we performed the feature selection and RF parameter tuning only once; however, the results might change when performing a feature selection for all of the different settings of the models.
There are several possibilities for the failure of the precipitation area delineation. We find that the detection of high precipitation amounts (above 90th percentile) is quite well captured by the new precipitation delineation, whereas low precipitation (below 10th percentile) performs poorly. The medium range of precipitation (between 10th and 90th percentile) is better (worse) captured (1) than precipitation below (above) the 10th (90th) percentile and (2) in the rainy season. Our results indicate that low precipitation rates are hard to capture using GEO satellite data. The difference of the brightness temperatures between precipitating and non-precipitating pixels, especially for low precipitation rates, is not inconsistent and leads to false results. In addition, a high FAR indicates that the model is not able to sufficiently differ between precipitation and no precipitation. High precipitation rates are especially important regarding disaster management and water availability for humans.
Adding static predictors did not change our results. We assume that, due to the rather coarse monthly basis, the feature selection did not include many of the static predictors, which, therefore, did not change the results much.
Our study shows that precipitation area delineation works best in the rainy months (July, August). We found quite a poor performance for the rather dry months May and September. The precipitating areas in June are captured with intermediate accuracy.
We tested four different temporal resolutions and found that scene based trained models performed best. Coarse temporal resolutions like monthly trained models instead lead to worse results. Daily and weekly trained models performed at an intermediate level, with a better tendency of daily models to capture precipitation. Furthermore, we tested four different balancing methods when training the model. We found that there are differences in the performance when changing the balance of rainy and non-rainy pixels. Both tests changed the results which indicates that the data input highly affects the prediction outcome. Depending on the modeling approach, the training data should be selected carefully.
We use IMERG's gauge calibrated MW precipitation for training and validation because it is the most reliable satellite precipitation product within IMERG [28]. However, it is restricted to those areas where MW is available. The gaps, where MW is not available, are filled in IMERG with IR based on one single band from GEO satellites. The indirect relation between IR and precipitation leads to an underestimation of precipitation. However, the IR is more frequently available compared to the MW precipitation. IMERG's IR only product is not able to capture precipitation precisely.
We showed that our precipitation area delineation outperforms the IR only precipitation from IMERG. This is due to the very low POD (missing precipitation) and the very high FAR in the IR only product. We therefore conclude that multispectral IR information enhances the precipitation area classification results. Figure 10 displays the distribution of the validation measures covering the available data in 2017 (Figure 10a-d). There is hardly any pattern visible for TP, FP and FN, but the TN are found to mostly occur in the south of the TiP. The western part of the TiP contains frequently masked out areas covered with snow and ice leading to data gaps in all verification measures. The gauge calibrated MW precipitation totals (Figure 10e) and the frequency of gauge calibrated MW precipitation (Figure 10f) capture the dry western and some northern parts of the TiP, whereas the south and southeast are moist and influenced by (1) orographic effects and (2) the monsoon. Other studies also focus on the classification of precipitating areas. To this end, the Meteosat Second Generation (MSG) Spinning Enhanced Visible and InfraRed Imager (SEVIRI) data with its resolution of 3 km and 15 min or the Himawari-8 with its 2 km and 15 min resolution are used several times for various areas [15][16][17][18][19][20][21]. Thies et al. [17] discriminate raining from non-raining cloud areas over Germany using MSG SEVIRI with reference to a German Weather radar product and found for night-time a POD of 0.6, a POFD of 0.2 and a FAR of 0.5. They also find comparable results for daytime. In their study about precipitation process and rainfall intensity differentiation, Thies et al. [18] use the detection of the precipitation areas together with the precipitation process to assign the precipitation rates. The POD varies between 0.1-0.7, whereas the POFD varies between 0-0.6, the FAR is between 0.3-0.9 and the HSS does not exceed 0.2. Kühnlein et al. [16] derive precipitation estimates from MSG SEVIRI using RF and find a POD of 0.5, POFD of 0.15, FAR of 0.5 and HSS of 0.2-0.5 for their classification of precipitating areas in Germany. Mapping rainfall over southern Africa using MSG SEVIRI lead to a validation aggregation of 1 h to a POD of 0.6, a low POFD of 0.18, but a high FAR of 0.8 and a low HSS of 0.18 [20].
Precipitating clouds were classified using IR from Himawari-8 where an enormous high POD of 0.98 and a high HSS of 0.2-0.8 is found, whereas the FAR does not exceed 0.01 [22]. Min et al. [21] estimate summertime precipitation from Himawari-8 and determine a POD of 0.58, a FAR between 0.27-0.41 and a HSS of 0.53. The TiP is especially challenging for satellite based precipitation retrieval techniques. This is due to the high elevation that causes low surface temperatures. These low surface temperatures hardly differ from the temperature of the clouds. This is problematic for the IR based techniques. On the other side, the high elevation and the semi-arid conditions are problematic for PMW sensors, since they often detect the ground signal in these cases which leads to misinterpretations in the retrieval algorithms. The same holds true for snow and ice covered areas which are very common on the TiP. Even if we use the best available data source within the IMERG product by choosing the MW based precipitation information as a reference for our technique, one has to keep in mind that this rainfall information should not be considered as the "truth" because of the described uncertainties.
GPM rainfall estimates are evaluated against gauge networks by Tan et al. [64] and their evaluation results in a POD of 0.7, FAR of 0.3-0.8 and a HSS of 0.3-0.7, whereas the data were evaluated on a precipitation threshold of 0.2 mm/hr. Everything below 0.2 mm/hr is not considered as precipitation because they state that small precipitation rates cannot be captured by satellites. Table 8 summarizes the validation metrics of other studies compared to our results. We use two GEO satellites to retrieve the precipitating area over the TiP. Future enhanced GEO systems from the Elektro-L series with a variety of spectral bands which can be operationally used will enhance the retrieval of precipitation over the TiP. In addition, possible minimal spatial offsets when superimposing the three different satellite products (Elektro-L2, Insat-3D, GPM IMERG) might be reduced. In addition, other studies indicate this issue [55,65].
The gauge calibrated MW precipitation combines imager and sounder which were all calibrated to the GPM core instrument (GMI). However, there are differences in the performance of precipitation detection in each sensor and especially in the differentiation between imager and sounder. There are two imagers (GMI, AMSR) and three sounders (ATMS, MHS, SSMIS) available in the V05B IMERG version. A sounder might perform worse than an imager, which then leads to a false classification [28]. The microwave humidity sounder (MHS) is known to strongly overestimate low rain rates and underestimate moderate intensities [28,36]. The MHS and the advanced temperature and moisture sounder (ATMS) are less accurate than the conical-scanning sensors like the GMI [66]. Since we would lose too much data when removing the sounder, we keep all data.

Conclusions
The lack of reliable precipitation data enhances the need for satellite-based precipitation retrievals for this region. Thus far, no precipitation retrieval especially for the TiP is available in a high spatio-temporal resolution. The aim of the study is to model the precipitation area delineation on the TiP using multispectral data from two GEO satellites (Insat-3D, Elektro-L2) with a machine learning approach (RF) based on the best quality gauge calibrated MW precipitation from IMERG. We showed that the precipitation area detection can be improved by (1) changing the ratio of the precipitating or non-precipitating data during training, and (2) by changing the temporal resolution to scene based trained models. Adding static predictors in the training process did not improve the precipitation area detection due to the coarse temporal resolution of the feature selection. We found a strong improvement of our multispectral precipitation area delineation compared to IMERG's single band IR only precipitation estimate with reference to the gauge calibrated MW precipitation. Our multispectral approach overcomes the main weakness of IMERG's IR only precipitation product, which underestimates precipitation [28].
Author Contributions: C.K., B.T. and J.B. conceived and designed the experiments; S.E., L.L. and H.M.S. helped with the preprocessing of the data. C.K. performed the experiments and C.K. and B.T. analyzed the data; B.T. and J.B. contributed materials and analysis tools; C.K. wrote the paper.
Funding: This study was conducted within the project "Precipitation patterns, snow and glacier response in High Asia and their variability on sub-decadal time scales-Remote Sensing of precipitation" (prime-RS), which is funded by the German Research Foundation (DFG projects: BE1780/46-1 and TH1531/6-1).

Acknowledgments:
The GPM IMERG (Version B05) data were kindly provided by the NASA/Goddard Space Flight Center's PMM Science Team and PPS, which develop and compute the IMERG Final Run Level 3 as a contribution to GPM, and archived at the NASA GES DISC. We thank SRC Planeta of Roshydromet for providing the Elektro-L2 data and the MOSDAC for providing the Insat-3D data. We kindly thank two anonymous reviewers for their valuable suggestions on a previous version of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: