Evaluation of General Circulation Models over the Upper Ouémé River Basin in the Republic of Benin

This study assessed the performance of eight general circulation models (GCMs) implemented in the upper Ouémé River basin in Benin Republic (West Africa) during the Fifth Assessment Report on Climate Change. Historical rainfall simulations of the climate model of Rossby Regional Centre (RCA4) driven by eight Coupled Model Intercomparison Project (CMIP5) GCMs over a 55-year period (1951 to 2005) are evaluated using the observational data set. Apart from daily rainfall, other rainfall parameters calculated from observed and simulated rainfall were compared. U-test and other statistical criteria (R2, MBE, MAE, RMSE and standard of standard deviations) were used. According to the results, the simulations correctly reproduce the interannual variability of precipitation in the upper Ouémé River basin. However, the models tend to produce drizzle. Especially, the overestimation of April, May and November rains not only explains the overestimation of seasonal and annual cumulative rainfall but also the early onset of the rainy season and its late withdrawal. However, we noted that this overestimation magnitude varies from one model to another. As for extreme rainfall indices, the models reproduced them poorly. The CanESM2, CNRM-CM5 and EC-EARTH models perform well for daily rainfall. A trade-off is formulated to select the common MPI-ESM-LR, GFDL-ESM2M, NorESM1-M and CanESM2 models for different rainfall parameters for the reliable projection of rainfall in the area. However, the MPI-ESM-LR model is a valuable tool for studying future climate change.


Introduction
Climate change resulting from anthropogenic greenhouse gas (GHG) emissions has adverse impacts on human lives, properties and the environment. Humans can only protect themselves from the inherent damage and manage the consequences of climate change by thinking about the development of projections of future changes. General circulation models (GCMs) are the most reliable means of estimating future climate change in an atmosphere where the concentration of greenhouse gases continues to increase significantly [1][2][3]. A climate model can be used to simulate past or future climate (climate projection). In addition to empirical laws and statistical adjustments, a climate model is based on fundamental physical laws (i.e., principles of conservation of mass, energy and moments) that can be mathematically transcribed [4]. Because of the large number of processes and interactions between the atmosphere, ocean and biosphere, the climate model is extremely complex. However, this sophisticated tool is limited by the computing power and time available today [5]. Thus, it does

Study Site
Benin is a West African country with estimated population of 9,983,884 inhabitants [16], located ( Figure 1) between longitudes 1 • E and 3 • 40 E and latitudes 6 • 30 N and 12 • 30 N. It is bordered by the Atlantic Ocean in the south, the Federal Republic of Nigeria in the east, Togo in the west, Burkina Faso in northwest and Niger in the north. The only high mountain chain is no more than 600 m in elevation and is located in western north of the country. The climate is characterized by a bimodal precipitation regime in the south and unimodal rainfall regime in the north while the central part of the country is under a transitional precipitation regime. The mean maximum temperature for the whole country varies between 28 and 33.5 • C, while the mean minimum temperature fluctuates between 24.5 and 27.5 • C. In the southern part of the country, the mean annual precipitation varies from 1500 mm in the east to 900 mm in the west, while in the northern region it varies from 700 mm in the far north to 1200 mm in the middle north. The main important rivers are Mékrou, Alibori and Sota in the north, and Ouémé (the major river of the country), Mono and Couffo in the middle and southern parts of the country. The upper Ouémé River basin, which is the study area, is located between 9 • N and 10 • N latitude and 1.5 • E and 2.8 • E longitude. The climate is characterized by the alternation of a unique rainy season and a dry season of approximately equivalent duration. The dry period ranges from November to March. All the rivers of this basin are characterized by an intermittent regime.
The mesoscale site of the upper Ouémé River basin (Benin) is part of the network of mesoscale observation sites of the AMMA-CATCH observatory. Thus, the upper Ouémé River basin is one of the most instrumented regions in West Africa. Different instruments are deployed there to monitor the water cycle and vegetation dynamics. The upper Ouémé River basin, which is the study area, is located between 9° N and 10° N latitude and 1.5° E and 2.8° E longitude. The climate is characterized by the alternation of a unique rainy season and a dry season of approximately equivalent duration. The dry period ranges from November to March. All the rivers of this basin are characterized by an intermittent regime.
The mesoscale site of the upper Ouémé River basin (Benin) is part of the network of mesoscale observation sites of the AMMA-CATCH observatory. Thus, the upper Ouémé River basin is one of the most instrumented regions in West Africa. Different instruments are deployed there to monitor the water cycle and vegetation dynamics.

Observational Data
In this research, rainfall series for "essential factor of differentiation in tropical environment" [17,18] have been used. A network of eight stations out of the twenty of the rainfall networks managed by the Benin Meteo Agency (Figure 1), given the length of their series, missing data and aspects of temporal homogeneity, have been exploited. The stations having at least data on the period 1951-2005 and less than 20% missing rainy season data [19] were selected. The complete list of these stations and their geographical coordinates are presented in Table 1. The stations of Kouandé and Tchaourou do not belong directly to our study site but have an influence on it.

Models
We evaluated a set of CMIP5 GCM-driven RCM simulations over West Africa. The RCM simulation outputs used here result from dynamic downscaling of eight CMIP5 GCMs' data using

Observational Data
In this research, rainfall series for "essential factor of differentiation in tropical environment" [17,18] have been used. A network of eight stations out of the twenty of the rainfall networks managed by the Benin Meteo Agency (Figure 1), given the length of their series, missing data and aspects of temporal homogeneity, have been exploited. The stations having at least data on the period 1951-2005 and less than 20% missing rainy season data [19] were selected. The complete list of these stations and their geographical coordinates are presented in Table 1. The stations of Kouandé and Tchaourou do not belong directly to our study site but have an influence on it.

Models
We evaluated a set of CMIP5 GCM-driven RCM simulations over West Africa. The RCM simulation outputs used here result from dynamic downscaling of eight CMIP5 GCMs' data using the RCA4, originally developed by the Swedish Meteorological and Hydrological Institute (SMHI) as part of the CORDEX initiative. Simulated outputs from RCA4 are available over Africa at a spatial resolution of 0.44 • × 0.44 • (~50 × 50 km) with temporal coverage from 1951 to 2005 in the Coordinated Regional Climate Downscaling Experiment (CORDEX) program context. Only historical simulations were used in this study. The resulting outputs from the eight CMIP5 models' data, dynamically downscaled by RCA4 during the fifth report on climate change (AR5) were obtained on the research website (http://www.cordex.org/data-access/esgf/). A single ensemble run was performed for each model, although some models had multiple ensemble runs. The set used was r1i1p1. In this research, we considered GCMs that simulate data for each of the 365 days of each year. However, the HadGEM2-ES model, which simulates data for only 360 days out of 365 of each year, was also considered and the missing data were generated using the R package forecast language. Table 2 provides details of the GCMs' data used in the downscaling experiment.

Model Performance
The analysis of the performance of the models with respect to the climatic observations needs a comparison between the observed values and those simulated by the climate models under consideration. Thus, observed and simulated daily rainfalls were compared. However, the annual, seasonal, quarterly and monthly totals as well as the moving averages calculated from these observed and simulated daily precipitations were also compared. To make the analysis much more complete, the rainfall indices (Table 3) were calculated from the observed data and those simulated by the models. These indices were then compared. Statistical criteria were used to compare the simulated against the observed data. It is recommended to use the mean absolute error (MAE) in model evaluation studies [20,21]. However, the root-mean-square error (RMSE) is more appropriate to represent model performance than the MAE when the error distribution should be Gaussian [22], and the use of RMSE is beneficial in certain circumstances. However, these authors recognize that when evaluating different models using a single metric, the differences in error distributions become larger. Accordingly, a combination of measures including (but certainly not limited to) RMSE and MAE is often necessary to evaluate model performance. In the present research, we used U-test [23] and statistical criteria such as the Pearson correlation coefficient calculation [3], the deviations of the standard deviations [3,24] as well as the mean bias (MBE) calculation [19], the MAE [19,20,22] and the RMSE for the comparison between simulated and observed daily rainfall; for the precipitation indices, annual, seasonal, quarterly and monthly total rainfall, MBE, MAE and RSME were used. The choice of these different criteria mentioned above for the evaluation of the results of the models is based on the fact that each criterion helps to appreciate a certain performance of the models. The procedure for the evaluation of GCMs based on their historical simulation of precipitation is outlined below: 1.
The use of U-test will help to assess whether the numerical general circulation models (GCMs) are sufficiently reliable.

2.
The correlation is used to quantify the linear relationship between observed rainfall and those simulated by the models. The calculation of the Pearson correlation coefficient is based on the calculation of the covariance between the two variables. The correlation coefficient is actually the standardization of the covariance. 3.
The difference between the standard deviations of observed rainfall and those simulated by the models provides information on the simulation of interannual variability by the models. 4.
MBE makes it possible to quantify the average systematic error (mean bias) of each model. The better the accuracy of the estimation model, the lower the mean bias. A positive bias indicates a tendency to overestimate the model, conversely a negative bias indicates that the estimated values are underestimated compared to those actually observed. The average bias (MBE) is calculated from the following formula: where X is the value of the parameter considered for each year (period from April to October) for the period of 55 years ; Obs corresponds to observations and Sim to simulations. N is the number of years (i.e., 55 years) and M is the number of stations considered for each model on our study site. 5.
MAE allows to quantify the amplitude of the average errors regardless of the positive or negative biases (magnitude of the differences between the simulated and observed values). The mean absolute error (MAE) is defined as follows: 6. RMSE allows to include evidence of not only the errors in the mean value but also those in the interannual variability of the simulated versus observed data. If the RMSE is almost identical to the mean absolute error (and not zero), a systematic error affects the mean value of the simulated values and the error on the variability is therefore low or zero. On the other hand, if the RMSE is greater than the MAE (and not zero), the simulated values are affected both by an error in the mean value of the data but also by an error in the interannual variability. RMSE is often used to describe the dispersion of observations and simulations (the lower the RMSE is, the more that the prediction models are related the observations). The RMSE is calculated according to the following formula: Statistical analyses were performed using the statistical software R 3.5.1 (R Development Core Team, http://www.R-project.org) with packages stats and forecast.

U-Test Results
First, the U-test or rank test (that tests the equality of two distributions) was performed, in order to assess the temporal reproduction of precipitation by the models. For this test, p-value took the value 2.2 × 10 −6 in all stations. The results of this test showed that there was a significant difference between the simulated and observed data at the 5% level at the eight stations. The boxplot of observed and simulated data ( Figure 2) clearly shows that the models tend to produce drizzle (overestimate low precipitation and underestimate heavy rainfall). Indeed, the heavy rains simulated by the models are much more concentrated between 20 and 50 mm. Observed rainfall above 50 mm is well taken into account by the models. This is much more evident with the model CanESM2.
difference between the simulated and observed data at the 5% level at the eight stations. The boxplot of observed and simulated data ( Figure 2) clearly shows that the models tend to produce drizzle (overestimate low precipitation and underestimate heavy rainfall). Indeed, the heavy rains simulated by the models are much more concentrated between 20 and 50 mm. Observed rainfall above 50 mm is well taken into account by the models. This is much more evident with the model CanESM2. The estimated rainfall values between 0 and 20 mm have a high density (Figure 3), which shows a natural tendency of climate models to estimate rainfall levels much lower than those observed in the field. In other words, the models predict more rainy days than observed. Extreme values are less predictable. In the figure below, we have, on the abscissa, the rainfall height expressed in millimeters (mm). The estimated rainfall values between 0 and 20 mm have a high density (Figure 3), which shows a natural tendency of climate models to estimate rainfall levels much lower than those observed in the field. In other words, the models predict more rainy days than observed. Extreme values are less predictable. In the figure below, we have, on the abscissa, the rainfall height expressed in millimeters (mm).
The estimated rainfall values between 0 and 20 mm have a high density (Figure 3), which shows a natural tendency of climate models to estimate rainfall levels much lower than those observed in the field. In other words, the models predict more rainy days than observed. Extreme values are less predictable. In the figure below, we have, on the abscissa, the rainfall height expressed in millimeters (mm).  Table 4 below gives the values of the correlation coefficient for all stations between observed and simulated rainfall values. Table 4 shows a weak correlation between the observed and the simulated values. Indeed, according to this table, it is observed that the correlation coefficients are low for all the stations. They are positive for all models except for the model HadGEM2-ES, where they are particularly weaker and negative at all stations. The correlation coefficients are higher for the model CNRM-CM5.

Interannual Variability
The simulated interannual rainfall variability of all stations studied is underestimated. Indeed, the deviations of the standard deviations are negative (Table 5).
According to this table, the results for the MPI-ESM-LR model appear to be more satisfactory than the results obtained with the other models because the absolute values of the deviations of the standard deviations is lower for all the stations. The highest values are obtained for the model CanESM2.
Considering the mean values of these deviations, the MPI-ESM-LR, GFDL-ESM2M, NorESM1-M, HadGEM2-ES, MIROC5, EC-EARTH, CNRM-CM5 and CanESM2 models are in the order of the most satisfactory to least satisfactory.

Statistic Criteria Results
The mean bias (MBE) between simulated and observed precipitation is positive for all models at all stations where rains appear to be overestimated, except for the CanESM2 model which underestimates them at Bembèrèkè (−0.764), Ina (−0.615) and Tchaourou (−0.170) and the model CNRM-CM5 at Bembèrèkè (−0.026). Table 6  For all models, RMSE > MAE, therefore the errors are not only due to biases in the mean, but also due to the models having difficulty in reproducing interannual variability.

Rainfall Cumulation Evolution
In this section, annual, seasonal and quarterly precipitation is analyzed.  Figure 4 below shows that the data simulated by the eight models generally reproduce the interannual rainfall dynamics quite correctly. However, in terms of amplitude, all models overestimated the annual rainfall accumulation at all stations, except the CanESM2 model which underestimates it in Bembèrèkè, Parakou and Tchaourou. However, as shown in Table 7 below, the average bias for all stations suggests an overestimation of annual rainfall totals for all models, except for the CanESM2 model where there is a very small underestimation (28.84 mm). The MAE and RMSE values are lower for CanESM2. Overall, the CanESM2 model appears to be the best model for simulating annual rainfall totals. It is followed by the CNRM-CM5 model. In all cases, RMSE > MAE, so the errors are not only related to a systematic bias in the average of the simulated totals but also to errors in the interannual variability of the rainfall totals. When the observed data are around 1000 mm, the CanESM2 model reproduces them better. This is the case at the sites of Bassila, Birni, Djougou, Kouandé and Tchaourou. In the case where the observed data are between 1000 and 1500 mm, the other models reproduce the trends of the actual observations. However, the EC-EARTH model is the most accurate, followed by the MIROC5 model.   This evolution of annual rainfall accumulation is also observed at the level of seasonal rainfall accumulation ( Figure 5). This evolution of seasonal cumulations suggests that the overestimation of annual cumulations by the models is due to the overestimation of seasonal cumulations by the models. Since the average bias, MAE and RMSE are lower for the CanESM2 model (Table 8)   This evolution of annual rainfall accumulation is also observed at the level of seasonal rainfall accumulation ( Figure 5). This evolution of seasonal cumulations suggests that the overestimation of annual cumulations by the models is due to the overestimation of seasonal cumulations by the models. Since the average bias, MAE and RMSE are lower for the CanESM2 model (Table 8)  Continuing the analysis with total quarterly precipitation (Figures 6-9), the following observations were noted: In the first quarter (January-February-March: JFM), at all observation sites, all models are representative except the MIROC5 model, which has higher overestimations of quarterly precipitation totals. The average bias is lower for all models except MIROC5, where it is higher ( Table 9). The CanESM2, MPI-ESM-LR and GFDL-ESM2M models underestimate the first quarter  Continuing the analysis with total quarterly precipitation ( Figures 6-9), the following observations were noted: In the first quarter (January-February-March: JFM), at all observation sites, all models are representative except the MIROC5 model, which has higher overestimations of quarterly precipitation totals. The average bias is lower for all models except MIROC5, where it is higher ( Table 9). The CanESM2, MPI-ESM-LR and GFDL-ESM2M models underestimate the first quarter totals, their bias being negative, while all other models overestimate them. The simulation accuracy is better for EC-EARTH, NorESM1-M, MPI-ESM-LR and HadGEM2-ES models in descending order.
Hydrology 2017, 4, x FOR PEER REVIEW 13 of 21 Figure 6. First quarter total rainfall evolution. In the second quarter (April-May-June: AMJ), the total of quarterly simulated data from all models follows trends in total of quarterly observed precipitation (Figure 7). However, these cumulations are overestimated by all models, with the average bias being positive everywhere (Table 10). With the lower values of MBE, MAE and RMSE, the CNRM-CM5 model appears more representative. It is followed by the CanESM2 model.  In the second quarter (April-May-June: AMJ), the total of quarterly simulated data from all models follows trends in total of quarterly observed precipitation (Figure 7). However, these cumulations are overestimated by all models, with the average bias being positive everywhere (Table 10). With the lower values of MBE, MAE and RMSE, the CNRM-CM5 model appears more representative. It is followed by the CanESM2 model.  In the third quarter (July-August-September: JAS), the CanESM2 model underestimates precipitation totals (MBE < 0) while all other models overestimate them (Table 11). However, the NorESM1-M and MPI-ESM-LR models have the highest biases. Overall, all models are representative. The quarterly totals of the data simulated by these models perfectly follow the trends of the quarterly totals of observed precipitation (Figure 8). The RMSE values are close for all models.  In the third quarter (July-August-September: JAS), the CanESM2 model underestimates precipitation totals (MBE < 0) while all other models overestimate them (Table 11). However, the NorESM1-M and MPI-ESM-LR models have the highest biases. Overall, all models are representative. The quarterly totals of the data simulated by these models perfectly follow the trends of the quarterly totals of observed precipitation (Figure 8). The RMSE values are close for all models.  In the fourth quarter (October-November-December: OND), the CanESM2 model underestimates cumulative precipitation (MBE < 0) while all other models overestimate it (MBE > 0). The MBE and RMSE values are higher for the MPI-ESM-LR model ( Table 12). The lowest mean bias is obtained with the EC-EARTH model (MBE = 4.821). This model simulates the fourth quarter totals more accurately. However, the NorESM1-M model has the smallest RMSE value and is therefore the most representative. The oscillations observed in the fourth quarter ( Figure 9) suggest that models overestimate the end of season rains, which often occur in October.  In the fourth quarter (October-November-December: OND), the CanESM2 model underestimates cumulative precipitation (MBE < 0) while all other models overestimate it (MBE > 0). The MBE and RMSE values are higher for the MPI-ESM-LR model ( Table 12). The lowest mean bias is obtained with the EC-EARTH model (MBE = 4.821). This model simulates the fourth quarter totals more accurately. However, the NorESM1-M model has the smallest RMSE value and is therefore the most representative. The oscillations observed in the fourth quarter ( Figure 9) suggest that models overestimate the end of season rains, which often occur in October. In total, the overestimation observed in the second quarter added to that of the fourth quarter at the level of all models explains the overestimations observed at the seasonal level and even at the annual level.  Thus, during the pre-installation period of the West African monsoon (pre-onset) and to some extent during the monsoon recession period, the models overestimate rainfall. In the period when  In total, the overestimation observed in the second quarter added to that of the fourth quarter at the level of all models explains the overestimations observed at the seasonal level and even at the annual level.
Thus, during the pre-installation period of the West African monsoon (pre-onset) and to some extent during the monsoon recession period, the models overestimate rainfall. In the period when the monsoon is well established (phase onset) the different models better represent the rainfall regime.

Seasonal Cycles Evolution
At this level of detail, the precipitation 11-day moving average over the period 1956-2005 ( Figure 10) follows those of the observations. Over the first three months, all models except MIROC5 and NorESM1-M represented the daily average precipitation. MIROC5 overestimates the daily average precipitation from mid-February and NorESM1-M overestimates it from mid-March.
From April 1 to the end of June, all models overestimated the daily average precipitation. However, the evolution of the daily average precipitation from the CNRM-CM5 model is much closer to the observations. The average daily precipitation for July and August is better represented by all models. However, from September to December, the CanESM2 and NorESM1-M models underestimate the average daily precipitation at all stations while the CNRM-CM5, MIROC5, MPI-ESM-LR, GFDL-ESM2M and EC-EARTH models represent it well in Bembèrèkè, Ina and Parakou but overestimate it in Bassila, Birni, Djougou, Kouandé and Tchaourou.
At this level of detail, the precipitation 11-day moving average over the period 1956-2005 ( Figure 10) follows those of the observations. Over the first three months, all models except MIROC5 and NorESM1-M represented the daily average precipitation. MIROC5 overestimates the daily average precipitation from mid-February and NorESM1-M overestimates it from mid-March.
From April 1 to the end of June, all models overestimated the daily average precipitation. However, the evolution of the daily average precipitation from the CNRM-CM5 model is much closer to the observations. The average daily precipitation for July and August is better represented by all models. However, from September to December, the CanESM2 and NorESM1-M models underestimate the average daily precipitation at all stations while the CNRM-CM5, MIROC5, MPI-ESM-LR, GFDL-ESM2M and EC-EARTH models represent it well in Bembèrèkè, Ina and Parakou but overestimate it in Bassila, Birni, Djougou, Kouandé and Tchaourou.
In summary, it is noted as indicated above that the start of the rainy season is very poorly simulated by models that have an early start with a high overestimation of rainfall. The end is also poorly simulated with early monsoon withdrawal for CanESM2 and NorESM1-M models and rather late withdrawal for CNRM-CM5, MIROC5, MPI-ESM-LR, GFDL-ESM2M and EC-EARTH models. It should also be noted that, unlike the unimodal observation regime, in a systematic way the models have a bimodal seasonal cycle with a first precipitation peak in May. Thus, the coastal regime is maintained in the models.
The results found in this section confirmed those previously found above at the upper time scales.

Observed and Simulated Precipitation Indices Comparison
The analysis of Tables 13 and 14 shows that all models overestimate the CWD and the number of days of heavy rain (R10mm) for all studied stations because the MBE values are positive. On the other hand, the average intensity per rainy day (SDII), CDD, RX1day, R3day, RX5day, number of very heavy rainy days (R20mm), number of extremely heavy rainy days (R25mm), the annual total rainfalls on very rainy days (R95P) and the annual total rain on extremely rainy days (R99P) are underestimated by the models (MBE < 0).
The RMSE values suggest that most indices' errors are essentially related to a systematic bias in the average of the simulated indices and little or no dependence on errors in interannual variability. However, the slight increase in RMSEs relative to MAEs for the CDD and RX3day In summary, it is noted as indicated above that the start of the rainy season is very poorly simulated by models that have an early start with a high overestimation of rainfall. The end is also poorly simulated with early monsoon withdrawal for CanESM2 and NorESM1-M models and rather late withdrawal for CNRM-CM5, MIROC5, MPI-ESM-LR, GFDL-ESM2M and EC-EARTH models.
It should also be noted that, unlike the unimodal observation regime, in a systematic way the models have a bimodal seasonal cycle with a first precipitation peak in May. Thus, the coastal regime is maintained in the models.
The results found in this section confirmed those previously found above at the upper time scales.

Observed and Simulated Precipitation Indices Comparison
The analysis of Tables 13 and 14 shows that all models overestimate the CWD and the number of days of heavy rain (R10mm) for all studied stations because the MBE values are positive. On the other hand, the average intensity per rainy day (SDII), CDD, RX1day, R3day, RX5day, number of very heavy rainy days (R20mm), number of extremely heavy rainy days (R25mm), the annual total rainfalls on very rainy days (R95P) and the annual total rain on extremely rainy days (R99P) are underestimated by the models (MBE < 0).
The RMSE values suggest that most indices' errors are essentially related to a systematic bias in the average of the simulated indices and little or no dependence on errors in interannual variability. However, the slight increase in RMSEs relative to MAEs for the CDD and RX3day indices for most models reveals an additional difficulty in reproducing the interannual variability of number of days of heavy precipitation and the maximum number of consecutive dry days. For the rest, the RMSE and MAE values seem almost identical. The number of rainy days is the most biased index.