Rainfall Threshold Estimation and Landslide Forecasting for Kalimpong, India Using SIGMA Model

: Rainfall-induced landslides are among the most devastating natural disasters in hilly terrains and the reduction of the related risk has become paramount for public authorities. Between the several possible approaches, one of the most used is the development of early warning systems, so as the population can be rapidly warned, and the loss related to landslide can be reduced. Early warning systems which can forecast such disasters must hence be developed for zones which are susceptible to landslides, and have to be based on reliable scientiﬁc bases such as the SIGMA (sistema integrato gestione monitoraggio allerta—integrated system for management, monitoring and alerting) model, which is used in the regional landslide warning system developed for Emilia Romagna in Italy. The model uses statistical distribution of cumulative rainfall values as input and rainfall thresholds are deﬁned as multiples of standard deviation. In this paper, the SIGMA model has been applied to the Kalimpong town in the Darjeeling Himalayas, which is among the regions most a ﬀ ected by landslides. The objectives of the study is twofold: (i) the deﬁnition of local rainfall thresholds for landslide occurrences in the Kalimpong region; (ii) testing the applicability of the SIGMA model in a physical setting completely di ﬀ erent from one of the areas where it was ﬁrst conceived and developed. To achieve these purposes, a calibration dataset of daily rainfall and landslides from 2010 to 2015 has been used; the results have then been validated using 2016 and 2017 data, which represent an independent dataset from the calibration one. The validation showed that the model correctly predicted all the reported landslide events in the region. Statistically, the SIGMA model for Kalimpong town is found to have 92% e ﬃ ciency with a likelihood ratio of 11.28. This performance was deemed satisfactory, thus SIGMA can be integrated with rainfall forecasting and can be used to develop a landslide early warning system.


Introduction
In a global database of landslide disasters given by Froude and Petley (2018) [1], three-quarters of all landslide events between 2004 to 2016 occurred in Asian countries, with substantial events in the Himalayas. Indian Himalayas are highly susceptible to landslides which are triggered primarily

Study Area and Input Data
Kalimpong town is a part of the Kalimpong district of West Bengal state, India, as shown in Figure 1. This hilly town belongs to Darjeeling Himalayas, hemmed between rivers Tista in the west and Relli in the east, with an elevation ranging from 355 m to 1646 m above mean sea level. The slopes in the western face of the town are steep, while the eastern slopes are gentle.
The geological setting of the region is associated with the evolution of Darjeeling Himalayan ranges. Precambrian high-grade gneiss and quartzite, calc-silicate and quartzite, high-grade schist phyletic etc. are the dominant rock types found in the region [28]. Upper sedimentary layers of the young folded mountains get eroded during heavy rainfalls. The area consists of several joints and cracks that accelerates the weathering of the rock and the formation of unconsolidated matter [29]. The bedrock throughout the study area is composed of Daling series quartz mica schist of golden to silver colors [30]. The inclination of bed towards the east and northeast varies from 20 • near river Tista to about 40 • towards town. These slopes in can be morphometrically classified into escarpment category A (>45 • ), steep slope category B (30 • -45 • ), moderate steep slope category C (20 • -30 • ) and gentle slope category D (10 • -20 • ). Silt to medium grained sand and loam constitutes a major portion of the topsoil of the area. According to GSI, more than 60% of the region comprises colluvium followed by older debris (24%) and young debris (2.5%). This geomorphological setting makes the region very prone to landslides, and rainfall is the main triggering factor.
Water 2020, 12, x FOR PEER REVIEW 3 of 13 of the topsoil of the area. According to GSI, more than 60% of the region comprises colluvium followed by older debris (24%) and young debris (2.5%). This geomorphological setting makes the region very prone to landslides, and rainfall is the main triggering factor. The geology of the area allows rainwater to percolate, increasing the pore pressure, therefore the shear strength of the soil decreases. The change in water content due to intense rainfall leads to the saturation of material and a sudden increase in the unit weight. This mechanism reduces the stability and resistance of parent rocks. The average annual precipitation in this area was observed to be 1872 mm during the study period, and the drainage density of the region is also very high. The area is drained by numerous mountainous natural streams (kholas) and their tributaries (jhoras). The precipitation with daily accuracy was collected for this study from the rain gauge maintained in Tirpai, Kalimpong (Save The Hills). The months from June to September are considered a monsoon period and the monthly rainfall from 2010 to 2017 is given in Table 1.   June  317  337  355  248  396  568  327  154  July  666  678  433  424  371  534  870  812  August  425  526  251  401  572  242  263  432  September 268  384  467  113  265  331  367  288 A landslide catalogue was prepared from the reports of the Geological Survey of India, newspapers and field surveys. The database contains the spatial and temporal distribution of rainfallinduced landslide events during 2010-2017. The dataset from 2010 to 2015 was used for model calibration and the dataset from 2016 to 2017 was used for model validation. The major fatal landslides happened in the region were shallow and rapid in nature, but there are some areas which experience continuous sinking because of slow, deep-seated movements, especially near major jhoras [32]. The movements are occurring gradually and are observed as cracks in buildings and roads after each monsoon. Since 2017, these slow movements are monitored using micro-electro-mechanical tilt The geology of the area allows rainwater to percolate, increasing the pore pressure, therefore the shear strength of the soil decreases. The change in water content due to intense rainfall leads to the saturation of material and a sudden increase in the unit weight. This mechanism reduces the stability and resistance of parent rocks. The average annual precipitation in this area was observed to be 1872 mm during the study period, and the drainage density of the region is also very high. The area is drained by numerous mountainous natural streams (kholas) and their tributaries (jhoras). The precipitation with daily accuracy was collected for this study from the rain gauge maintained in Tirpai, Kalimpong (Save The Hills). The months from June to September are considered a monsoon period and the monthly rainfall from 2010 to 2017 is given in Table 1.   June  317  337  355  248  396  568  327  154  July  666  678  433  424  371  534  870  812  August  425  526  251  401  572  242  263  432  September  268  384  467  113  265  331  367  288 A landslide catalogue was prepared from the reports of the Geological Survey of India, newspapers and field surveys. The database contains the spatial and temporal distribution of rainfall-induced landslide events during 2010-2017. The dataset from 2010 to 2015 was used for model calibration and the dataset from 2016 to 2017 was used for model validation. The major fatal landslides happened in the region were shallow and rapid in nature, but there are some areas which experience continuous sinking because of slow, deep-seated movements, especially near major jhoras [32]. The movements are occurring gradually and are observed as cracks in buildings and roads after each monsoon. Since 2017, these slow movements are monitored using micro-electro-mechanical tilt sensors installed at Chibo [33]. During the validation period (years 2016-2017), ground displacements were observed on 7 days at two locations [2].
The annual cumulative rainfall for these years is plotted in Figure 2a and the temporal distribution of landslides along with the average rainfall is shown in Figure 2b. It is observed that the number of landslide events is maximum in the month of July where the rainfall peak is recorded. From Figure 2b, it is clear that the number of landslides is directly related to the rainfall amount. Landslides are becoming an increasing menace in the region during monsoon season. The havocs related to landslides have multilevel impacts on the livelihood of population. Loss of farm lands and disruption of roads are affecting the income sources of the people. The socioeconomic development of the region is throttled by the disasters and associated setbacks. Hence, it is critical to adopt measures to minimize the impact of landslides in the region. An effective approach is to develop an early warning system using rainfall thresholds to forecast the occurrence of landslides. Since the area is affected by landslides of mixed typology (rapid shallow slides and slow deep seated movements), we took into account the SIGMA model [23], specifically conceived for similar settings, and we customized it for an application in the study area.
Water 2020, 12, x FOR PEER REVIEW 4 of 13 sensors installed at Chibo [33]. During the validation period (years 2016-2017), ground displacements were observed on 7 days at two locations [2]. The annual cumulative rainfall for these years is plotted in Figure 2a and the temporal distribution of landslides along with the average rainfall is shown in Figure 2b. It is observed that the number of landslide events is maximum in the month of July where the rainfall peak is recorded. From Figure 2b, it is clear that the number of landslides is directly related to the rainfall amount. Landslides are becoming an increasing menace in the region during monsoon season. The havocs related to landslides have multilevel impacts on the livelihood of population. Loss of farm lands and disruption of roads are affecting the income sources of the people. The socioeconomic development of the region is throttled by the disasters and associated setbacks. Hence, it is critical to adopt measures to minimize the impact of landslides in the region. An effective approach is to develop an early warning system using rainfall thresholds to forecast the occurrence of landslides. Since the area is affected by landslides of mixed typology (rapid shallow slides and slow deep seated movements), we took into account the SIGMA model [23], specifically conceived for similar settings, and we customized it for an application in the study area.

The SIGMA Model
The SIGMA model was developed for the Emilia-Romagna region in Italy [23]. This model uses the standard deviation of a statistical distribution as the key parameter for the analysis and defines thresholds as a function of standard deviation, predicting the potential of rainfall to initiate landslide events in the study area. Since the model is based on statistical distribution, it is easily exportable to other regions [23]. Adopting the methodology from Martelloni et al. (2012) [23], a customized SIGMA model for Kalimpong town is derived in this study. The modifications are in accordance with the historical database collected for the study area, thus making SIGMA compatible for a different hydro-meteo-geological setting than the area for which it is originally developed. The daily precipitation data were added at 'n' days, with an 'n' day wide shifting window which moves at everyday time steps throughout rainfall data. The values of 'n' will vary from 1 to 365. To calculate the cumulative probability distribution for each data set, a standard distribution, which is the target function is chosen as a model [34].This transformation relates the cumulative rainfall (z) with the target distribution (y = a,σ) ('σ' is the standard deviation of the series and 'a' is a multiplication constant). For each 'n' day cumulative rainfall series, the values are sorted in ascending order such that and a cumulative sample frequency is defined as where, 1 ≤ k ≤ n.
The conversion is carried out using the cumulative distribution function of z, termed as F(z). For each value of z k , F(z k ) defines the probability that the variable z takes a value less than z k , where k varies between 1 to n.
The original data z transformed to y is obtained as: After applying the transformation function, from a particular value of standard deviation or its multiple, cumulative sample frequency and precipitation can be calculated. The same procedure is repeated for all values of n from 1 to 365 and precipitation curves (σ curves) are plotted. The probability curves derived are used as the input values in the algorithm. A level of warning is predicted for everyday based on the rainfall thresholds. Rainfall recordings were cumulated with one day time steps for a particular time interval. These values are compared with the precipitation curves, from shorter to longer time frames [23]. In the case of shallow landslides, the analysis should focus on the immediate effect of rainfall: the cumulative rainfall values up to two days before the day of analysis is considered. The decisional algorithm used is given in Equation 4: where, ∆ = a,σ, C 1-3 is the vector indicating the cumulated rainfall at time t and S n (∆) are the thresholds relative to number of days n and ∆ [23]. In the case of slow movements, the algorithm ponders the effect of cumulative rainfall from 4 days up to 63 days [23]. The condition for crossing the threshold is given by: The definitions of vector C are kept the same and have been used in the study for the analysis. The analysis was carried out in the same method proposed by the developers of the SIGMA model, to define the thresholds for Kalimpong town.

Analysis
The rainfall and landslide data for Kalimpong town have been used to develop (2010-2015) and validate (2016 and 2017) rainfall thresholds for the region. The spatial distribution of landslide events during the study period is shown in Figure 3.
Water 2020, 12, x FOR PEER REVIEW 6 of 13 The rainfall and landslide data for Kalimpong town have been used to develop (2010-2015) and validate (2016 and 2017) rainfall thresholds for the region. The spatial distribution of landslide events during the study period is shown in Figure 3. For each day, 'n'-day cumulative rainfall values were calculated with n ranging from 1 to 365. Cumulative probability distribution curves were plotted after sorting the values in ascending order. For small values of 'n', the distributions were found to be closer to log-normal, and for higher values of 'n', the distributions tend towards normal. The asymmetric distribution of data sets has been observed by other researchers as well [23]. Choosing Gaussian distribution as a target function, cumulative values corresponding to multiples of SIGMA were calculated by applying the transformation, as shown in Figure 4a.
After applying the transformation, a probability of not overcoming a particular 'aσ' value can be calculated using the reverse procedure. For each value of 'aσ', cumulative values for n-days varying from 1 to 365 were plotted as SIGMA curves. The values of standard curves were initially taken as 1.5σ, 1.75σ, 2σ and 2.5σ and are plotted in Figure 4b. For each day, 'n'-day cumulative rainfall values were calculated with n ranging from 1 to 365. Cumulative probability distribution curves were plotted after sorting the values in ascending order. For small values of 'n', the distributions were found to be closer to log-normal, and for higher values of 'n', the distributions tend towards normal. The asymmetric distribution of data sets has been observed by other researchers as well [23]. Choosing Gaussian distribution as a target function, cumulative values corresponding to multiples of SIGMA were calculated by applying the transformation, as shown in Figure 4a.
After applying the transformation, a probability of not overcoming a particular 'aσ' value can be calculated using the reverse procedure. For each value of 'aσ', cumulative values for n-days varying from 1 to 365 were plotted as SIGMA curves. The values of standard curves were initially taken as 1.5σ, 1.75σ, 2σ and 2.5σ and are plotted in Figure 4b. From the probability distribution plots, SIGMA curves have been combined using an algorithm, which is the crucial part of the SIGMA model. The algorithm defines four different levels of alert, such as "red", "orange", "yellow" and "green". These values are used to delineate exceptional rainfall values. The starting algorithm for the proposed model is as shown in Figure 5. It considers the effect of short-term rainfall first, and exceedance of threshold will give high criticality alert for the day. If a red alert case does not exist, first orange alert level and then yellow alert level were checked for each day. If the result is negative in all cases, absent criticality (green color) is defined for the day. Hence, if a landslide happens to continue for a number of days, an effective model should predict the corresponding warning level on each day. The block diagram proposed in Figure 5 has to be considered as a starting point for the work, since it was then calibrated as described in the following paragraphs. From the probability distribution plots, SIGMA curves have been combined using an algorithm, which is the crucial part of the SIGMA model. The algorithm defines four different levels of alert, such as "red", "orange", "yellow" and "green". These values are used to delineate exceptional rainfall values. The starting algorithm for the proposed model is as shown in Figure 5. It considers the effect of short-term rainfall first, and exceedance of threshold will give high criticality alert for the day. If a red alert case does not exist, first orange alert level and then yellow alert level were checked for each day. If the result is negative in all cases, absent criticality (green color) is defined for the day. Hence, if a landslide happens to continue for a number of days, an effective model should predict the corresponding warning level on each day. The block diagram proposed in Figure 5 has to be considered as a starting point for the work, since it was then calibrated as described in the following paragraphs. A threshold is considered to be exceeded if any of the elements in the vector crosses the value. Once a threshold is exceeded, the algorithm defines the level of warning on each day. These outputs were used to calibrate the model (data from 2010 to 2016). A trial and error procedure has been adopted in the optimization module of the algorithm, which relates the daily warning levels with landslide occurrences, as in Martelloni et al. (2012) [23]. The value of threshold is progressively raised so that false alarms are avoided. A visualization of the procedure is shown in Figure 6 where standard SIGMA value of 1.75 was optimized to 1.95. Using the same procedure, other standard values of 1.5 and 2 were optimized to 1.65, and 2.05, respectively. The standard value of 2.5 remained the same after optimization. The thresholds values were increased to minimize false alarms for each event, such that no true alarms are missed. The execution of this module terminates once the algorithm catches an event with an observed warning level conforming to the considered threshold. The standard SIGMA curves remain the same, but the calibration process gives a modified set of SIGMA curves for the region.  A threshold is considered to be exceeded if any of the elements in the vector crosses the value. Once a threshold is exceeded, the algorithm defines the level of warning on each day. These outputs were used to calibrate the model (data from 2010 to 2016). A trial and error procedure has been adopted in the optimization module of the algorithm, which relates the daily warning levels with landslide occurrences, as in Martelloni et al. (2012) [23]. The value of threshold is progressively raised so that false alarms are avoided. A visualization of the procedure is shown in Figure 6 where standard SIGMA value of 1.75 was optimized to 1.95. Using the same procedure, other standard values of 1.5 and 2 were optimized to 1.65, and 2.05, respectively. The standard value of 2.5 remained the same after optimization. The thresholds values were increased to minimize false alarms for each event, such that no true alarms are missed. The execution of this module terminates once the algorithm catches an event with an observed warning level conforming to the considered threshold. The standard SIGMA curves remain the same, but the calibration process gives a modified set of SIGMA curves for the region.  A threshold is considered to be exceeded if any of the elements in the vector crosses the value. Once a threshold is exceeded, the algorithm defines the level of warning on each day. These outputs were used to calibrate the model (data from 2010 to 2016). A trial and error procedure has been adopted in the optimization module of the algorithm, which relates the daily warning levels with landslide occurrences, as in Martelloni et al. (2012) [23]. The value of threshold is progressively raised so that false alarms are avoided. A visualization of the procedure is shown in Figure 6 where standard SIGMA value of 1.75 was optimized to 1.95. Using the same procedure, other standard values of 1.5 and 2 were optimized to 1.65, and 2.05, respectively. The standard value of 2.5 remained the same after optimization. The thresholds values were increased to minimize false alarms for each event, such that no true alarms are missed. The execution of this module terminates once the algorithm catches an event with an observed warning level conforming to the considered threshold. The standard SIGMA curves remain the same, but the calibration process gives a modified set of SIGMA curves for the region.

Results
For the validation of results, rainfall and landslide data of 2016 and 2017 have been used. For each day of the validation period, the level of warning predicted by the decisional algorithm was compared with the reported landslide events according to the classical scheme of a confusion matrix, as shown in Figure 7. A confusion matrix is used to describe the performance of a decisional algorithm on a validation dataset for which the true values are known. The performance of an algorithm can be visualized using this matrix.
Water 2020, 12, x FOR PEER REVIEW 9 of 13

Results
For the validation of results, rainfall and landslide data of 2016 and 2017 have been used. For each day of the validation period, the level of warning predicted by the decisional algorithm was compared with the reported landslide events according to the classical scheme of a confusion matrix, as shown in Figure 7. A confusion matrix is used to describe the performance of a decisional algorithm on a validation dataset for which the true values are known. The performance of an algorithm can be visualized using this matrix. Correct predictions can be both true positives (days in which the model forecasted correctly the occurrence of at least a landslide) and true negatives (days in which the model forecasted correctly that no landslides occurred). False negatives are those days in which the algorithm missed the alarms, and false positives are days in which the model issued false alarms. During 2016, eight shallow landslide events are reported in the region. The events were rapid in nature and happened to occur on a single day. Among the eight events, six were correctly predicted by the SIGMA model. Ground displacements were reported at two locations in the Chibo-Pashyor area during seven days in 2017: on 28th-29th July 2017 and 13th-17th August 2017 [2]. In all seven days, ordinary criticality was wellpredicted in the present analysis. 55 false alarms were forecasted in a span of two years. An overview of the quantitative validation of the model for Kalimpong is tabulated in Table 2.   Correct predictions can be both true positives (days in which the model forecasted correctly the occurrence of at least a landslide) and true negatives (days in which the model forecasted correctly that no landslides occurred). False negatives are those days in which the algorithm missed the alarms, and false positives are days in which the model issued false alarms. During 2016, eight shallow landslide events are reported in the region. The events were rapid in nature and happened to occur on a single day. Among the eight events, six were correctly predicted by the SIGMA model. Ground displacements were reported at two locations in the Chibo-Pashyor area during seven days in 2017: on 28th-29th July 2017 and 13th-17th August 2017 [2]. In all seven days, ordinary criticality was well-predicted in the present analysis. 55 false alarms were forecasted in a span of two years. An overview of the quantitative validation of the model for Kalimpong is tabulated in Table 2.

Discussion
The validation of the SIGMA model for the study area gave satisfactory results by predicting all slow movements events correctly and producing two missed alarms. However, this came at the cost of having a relatively high count of false alarms (22).
As pointed out by Lagomarsino et al. (2015) [27], the most complete form of validation for a threshold model is to compare the skill scores to the ones obtained with the application of other models in the same test site. Several rainfall thresholds for landslides in Kalimpong region has already been defined [2,10,15]. Therefore, the validation statistics of SIGMA were compared to those obtained in the same test site by two already published works, which make use of ED and ID thresholds, as shown in Table 3. It can be seen that during the validation period, SIGMA outperforms the other models. The terms for evaluating the overall performance of model, efficiency and likelihood ratio, are maximum for SIGMA among the models tested. Efficiency being the ratio of true predictions to a total number of predictions, does not show a significant variation amongst different models. This is often reported in LEWS [27] where the number of true negatives are of a higher order than the other three variables. Specificity measures the ratio of correctly predicted days with no landslides to the total number of days without landslides, and sensitivity denotes the ratio of correctly predicted landslides to the total number of landslides. Likelihood ratio is the ratio of sensitivity to 1-specificity, evaluating the effect of both the parameters. The main reason for the better performance of SIGMA seems to be the effectiveness in predicting the slow movements occurred in 2017: a technique based on detecting antecedent rainfall anomalies. SIGMA is more suited than ID and ED thresholds to forecast slope movements with a complex hydrological response [26]. ID and ED thresholds correctly predicted seven out of eight shallow landslides happened in 2016 but failed to forecast the slow movements in 2017 except for one day. It is also observed that the number of false alarms is also less in the SIGMA model, when compared with the other models. However, before having a definitive response on which would be the threshold model more effective to use in an EWS in Kalimpong, further tests are needed and a larger validation dataset needs to be accounted for.
In addition, the validation showed that SIGMA could need to be improved further, especially concerning the high number of false alarms. That was not a surprising outcome since in the calibration, the optimization procedure aimed at minimizing false negatives (missed alarms) instead of searching for a compromise between missed alarms and correct predictions, thus leading to a high number of false positives. A research direction worth exploring is testing different time intervals in the decisional algorithm: the one used in this research are the one resulted optimal for the Emilia-Romagna region (Italy), and they were defined after a long period of adjustments [26]. The different physical settings of Kalimpong allow for a different optimal set of SIGMA values and time intervals to be defined.

Conclusions
Forecasting of rainfall-induced landslides in Kalimpong town have been carried out using the SIGMA model and considering the historical rainfall and landslide information. A single parameter, the cumulative rainfall, defines the threshold by means of a set of statistical thresholds.
The algorithm was designed to consider a three day rainfall effect for shallow landslides and more days (up to 63) for slow landslides. The time period and standard SIGMA values were decided by trial and error procedure during calibration, minimizing missed alarms and false alarms. A validation procedure showed satisfactory results and proved that SIGMA performed better than other ED and ID thresholds defined for the same region by previous works. For the study area, where both rapid and slow movements are present, the combined use of short-term and long-term antecedent rainfall is thus a point of strength of the model.
It can be concluded that the SIGMA model is a simple and efficient tool which can be used for landslide early warning on regional scale. The model predicts warning levels associated with each day, which can be directly linked to the severity of landslide events predicted. This study proves that the SIGMA model can be exported in parts of the world other than Italy, where the model was originally conceived, with satisfactory performance.
While applying the SIGMA model for a study area different from Italy, in a different hydro-meteo-geological setting, it was found that the values of S n (∆) of Kalimpong is different from those used for Italy [23]. A simpler algorithm than the one used for Italy was found to provide optimum results, as a smaller area and single rain gauge is considered for the analysis. Being a statistical model, the starting algorithm was decided by trial and error using the meteorological data and was fine-tuned by minimizing false alarms using an optimization procedure. The algorithm correctly predicted warnings on 13 out of 15 days of landslides during the validation period (2016-2017).
The events in 2017 were the result of continuous rainfall over a longer time period. It can be concluded that this algorithm-based approach considers the effect of both long-term and short-term rainfall and even slow movements are predicted, providing a performance better than traditional ID and ED thresholds.
The number of false alarms generated has to be reduced either by tuning the SIGMA levels and the time interval lengths, or by improving the model conceptually. As an instance, physical parameters like soil moisture can be considered along with the rainfall data to increase the positive predictive power [18,25]. Also to expand the model for a larger area, spatial variability of meteorological parameters should be considered [36,37]. After further tests and developing standard action plans for each level of warning, this model has the potential to be integrated with rainfall forecasting and to be used as a landslide early warning system on a regional scale.