# Genetic Programming for the Downscaling of Extreme Rainfall Events on the East Coast of Peninsular Malaysia

## Abstract

## 1. Introduction

## 2. Methodology

#### 2.1. Data and Sources

**a**) Location of the rain gauge stations in Peninsular Malaysia; (

**b**) terrain map of the study area.

Index | Description | Unit |
---|---|---|

R90 | Total number of days during NE monsoon in a year with rainfall ≥90th percentile of 1961–1990 daily rainfall | day |

CDD | Maximum number of consecutive dry days (rainfall = 0) in a year | day |

CWD | Maximum number of consecutive wet days (rainfall > 0) in a year | day |

#### 2.2. Selection of Predictors

No. | Variables | Description | No. | Variables | Description |
---|---|---|---|---|---|

1 | mslp | Mean sea level pressure | 14 | p5zh | 500 hPa divergence |

2 | p_f | Surface airflow strength | 15 | p8_f | 850 hPa airflow strength |

3 | p_u | Surface zonal velocity | 16 | p8_u | 850 hPa zonal velocity |

4 | p_v | Surface meridional velocity | 17 | p8_v | 850 hPa meridional velocity |

5 | p_z | Surface vorticity | 18 | p8_z | 850 hPa vorticity |

6 | p_th | Surface wind direction | 19 | p800 | 850 hPa geopotential height |

7 | p_zh | Surface divergence | 20 | p8th | 850 hPa wind direction |

8 | p5_f | 500 hPa airflow strength | 21 | p8zh | 850 hPa divergence |

9 | p5_u | 500 hPa zonal velocity | 22 | rhum | Near surface relative humidity |

10 | p5_v | 500 hPa meridional velocity | 23 | r500 | Relative humidity at 500 hPa |

11 | p5_z | 500 hPa vorticity | 24 | r850 | Relative humidity at 850 hPa |

12 | p500 | 500 hPa geopotential height | 25 | shum | Near surface specific humidity |

13 | p5th | 500 hPa wind direction | 26 | temp | Mean temperature |

#### 2.3. Statistical Downscaling Model

#### 2.4. Statistical Downscaling Using Multilayer Perceptron Artificial Neural Network

_{n}) for a dataset. Unfortunately, the techniques used in linear regression to estimate the regression coefficients cannot be applied to logistic regression [18]. Traditionally, a stepwise regression procedure combined with the maximum likelihood method is used to determine significant predictors and their contribution to the probability of the target variable. The main disadvantage of stepwise regression with forward selection is that it can often result in biased selection of significant predictors [58]. In order to overcome this problem, a number of methods have been proposed, such as ridge regression, the least absolute shrinkage and selection operator [59], the elastic net [60], etc. However, these techniques often fail to infer sparse models or can exhibit undesirable behavior in the presence of highly correlated predictors [58]. Recently, GP has been proposed to overcome the inherent difficulties of logistic regression. Biesheuvel et al. [30] compared the performance of GP with logistic regression in diagnosing pulmonary embolism and reported that although the interpretation of a GP model is less intuitive, it is a promising technique for the development of prediction rules for diagnostic and prognostic purposes. Engoren et al. [31] also came to a similar conclusion and reported that GP can improve the prediction accuracy of logistic regression. The application of GP-based logistic regression has increased in recent years in different fields of science and technology [61,62,63,64].

## 3. Results

#### 3.1. Selection of Predictors

**a**) The NCEP variables from different grid points with good correlation with heavy rainfall events during the NE monsoon; (

**b**) the plot of regression coefficients between different subsets of NCEP variables and heavy rainfall events at Dungun station.

Event | Predictor | Code | Description |
---|---|---|---|

90th percentile rainfall event | P1 | Cd(23) | Relative humidity at 500 hPa at grid point Cd |

P2 | Db(2) | Surface airflow strength at grid point Db | |

P3 | Dc(3) | Surface zonal velocity at grid point Dc | |

Rainfall event | P1 | Cd(23) | Relative humidity at 500 hPa at grid point Cd |

P2 | Dd(24) | Relative humidity at 850 hPa at grid point Dd | |

P3 | Db(3) | Surface zonal velocity at grid point Db |

#### 3.2. Downscaling Using GP

#### 3.2.1. Downscaling Heavy Rainfall Days

**Table 4.**Overall hit rates during training and validation of genetic programming (GP) models in downscaling rainfall indices.

Rainfall Indices | Station Name | Hit Rate (%) | |
---|---|---|---|

Training | Validation | ||

Days with larger than or equal to 90th percentile rainfall | Besut | 81.1 | 78.0 |

Dungun | 80.1 | 76.8 | |

Kemaman | 78.3 | 75.9 | |

Rainy days | Besut | 86.1 | 82.0 |

Dungun | 84.5 | 80.2 | |

Kemaman | 83.9 | 81.0 |

Rainfall Indices | Station Name | Equation |
---|---|---|

Days with larger than or equal to 90th percentile rainfall | Besut | −1.28[P_{2} − P_{1} + 1.258[P_{2} − P_{1} − 1.36[P_{2} + P_{3}]]] × [P_{2} − P_{1}] |

Dungun | [P_{2} + P_{1} − 2.56[P_{2} + P_{1} − 3.26[P_{1} + P_{2} + P_{3}]]] × 1.23P_{2} | |

Kemaman | −1.11[P_{2} − P_{1} + 1.56[P_{2} − P_{1} − 1.45[P_{2} + P_{3}]]] × [P_{2} − P_{1} + P_{3}] | |

Rainy days | Besut | (P_{3} − P_{1} + P_{2}) − 1.51(P_{3} × P_{2} − P_{1}) + 1.14(P_{1} − P_{3} + sqrt(P_{1})) |

Dungun | 1.23 × (P_{1} + P_{2}) − 1.26(P_{3} × P_{2} − P_{1}) + 0.86(P_{1} − P_{2}) | |

Kemaman | 2.34 × P_{2} − 2.13(P_{3} × 1.54 − P_{1}) + 1.45(P_{1} − P_{3} × P_{1}) |

#### 3.2.2. Downscaling Consecutive Wet and Dry Days

#### 3.3. Downscaling Using ANN

#### 3.4. Downscaling Using the SDSM

**Figure 6.**Observed number of heavy rainfall days and those downscaled by GP, the ANN and the statistical downscaling model (SDSM) during model validation at: (

**a**) Besut; (

**b**) Dungun; and (

**c**) Kemaman.

**Figure 7.**Observed number of consecutive wet days and those downscaled by GP, the ANN and the SDSM during model validation at: (

**a**) Besut; (

**b**) Dungun; and (

**c**) Kemaman.

#### 3.5. Comparison of Results

^{2}) between observed and downscaled values during validation are given in Table 6. The table shows that the errors in estimation of the number of heavy rainfall days by GP are always significantly less compared to ANN and SDSM estimations. The correlation coefficient between observed values and GP downscaled values during the validation period was also found to be higher compared to ANN and SDSM downscaling.

**Figure 8.**Observed number of consecutive dry days and those downscaled by GP, the ANN and the SDSM during model validation at (

**a**) Besut; (

**b**) Dungun; and (

**c**) Kemaman.

Indices | Station | GP | ANN | SDSM | |||
---|---|---|---|---|---|---|---|

RMSE | r^{2} | RMSE | r^{2} | RMSE | r^{2} | ||

90th percentile rainfall days | Besut | 1.08 | 0.75 | 1.31 | 0.58 | 2.16 | 0.43 |

Dungun | 1.14 | 0.73 | 1.83 | 0.58 | 2.69 | 0.41 | |

Kemaman | 1.13 | 0.67 | 1.62 | 0.61 | 2.57 | 0.51 | |

Consecutive wet days | Besut | 1.02 | 0.88 | 1.73 | 0.80 | 1.95 | 0.65 |

Dungun | 1.05 | 0.89 | 1.54 | 0.83 | 1.74 | 0.61 | |

Kemaman | 1.10 | 0.96 | 1.66 | 0.92 | 2.20 | 0.46 | |

Consecutive dry days | Besut | 1.06 | 0.83 | 1.55 | 0.73 | 1.99 | 0.67 |

Dungun | 1.04 | 0.91 | 1.65 | 0.87 | 2.28 | 0.85 | |

Kemaman | 1.23 | 0.82 | 1.60 | 0.82 | 2.26 | 0.77 |

**Figure 9.**Monthly/seasonal distribution of: (

**a**) heavy rainfall days, (

**b**) consecutive wet days and (

**c**) consecutive dry days at Besut; (

**d**) heavy rainfall days; (

**e**) consecutive wet days and (

**f**) consecutive dry days at Dungun; and (

**g**) heavy rainfall days, (

**h**) consecutive wet days and (

**i**) consecutive dry days at Kemaman.

**Table 7.**The RMSE in downscaled seasonal extreme indices using GP, the ANN and the SDSM during model validation.

Indices | Station | GP | ANN | SDSM |
---|---|---|---|---|

90th percentile rainfall days | Besut | 1.66 | 2.18 | 3.16 |

Dungun | 1.68 | 2.35 | 2.45 | |

Kemaman | 1.80 | 3.12 | 3.54 | |

Consecutive wet days | Besut | 5.02 | 6.80 | 6.96 |

Dungun | 3.81 | 5.15 | 6.32 | |

Kemaman | 2.92 | 6.96 | 7.91 | |

Consecutive dry days | Besut | 1.80 | 3.54 | 4.95 |

Dungun | 2.06 | 4.03 | 5.70 | |

Kemaman | 2.69 | 4.72 | 5.70 |

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

