Development and Assessment of Seasonal Rainfall Forecasting Models for the Bani and the Senegal Basins by Identifying the Best Predictive Teleconnection

: The high variability of rainfall in the Sahel region causes droughts and ﬂoods that affect millions of people every year. Several rainfall forecasting models have been proposed, but the results still need to be improved. In this study, linear, polynomial, and exponential models are developed to forecast rainfall in the Bani and Senegal River basins. All three models use Atlantic sea surface temperature (SST). A fourth algorithm using stepwise regression was also developed for the precipitation estimates over these two basins. The stepwise regression algorithm uses SST with covariates, mean sea level pressure (MSLP), relative humidity (RHUM), and ﬁve El Niño indices. The explanatory variables SST, RHUM, and MSLP were selected based on principal component analysis (PCA) and cluster analysis to ﬁnd the homogeneous region of the Atlantic with the greatest predictive ability. PERSIANN-CDR rainfall data were used as the dependent variable. Models were developed for each pixel of 0.25 ◦ × 0.25 ◦ spatial resolution. The second-order polynomial model with a lag of about 11 months outperforms all other models and explains 87% of the variance in precipitation over the two watersheds. Nash–Sutcliffe efﬁciency (NSE) values were between 0.751 and 0.926 for the Bani River basin and from 0.175 to 0.915 for the Senegal River basin, for which the lowest values are found in the driest area (Sahara). Results showed that the North Atlantic SST shows a more robust teleconnection with precipitation dynamics in both basins.


Introduction
The Sahel is a semi-desertic region covering Africa's territory from the Atlantic Ocean to the Red Sea and dividing the Sahara Desert from the moist savannah.This region is widely known for being vulnerable to desertification, for its scarce water availability and rapid environmental degradation [1].Indeed, interannual rainfall has seen important changes in the last five decades.Severe drought in the 1970s and 1980s brought famine and humanitarian crisis in the region [2][3][4].While at the break of the current century, Samimi et al. [5] observed intense rainfall in 2007, equivalent to values with a return period of 1200 years.In addition, Biasutti [6] reports an increase in rainfall in the central and eastern Sahel, as well as a decrease in rainfall in the western Sahel, with intense and isolated rainfall.
Rainfall variability at the Sahel is dynamically related to the variability of atmospheric circulation, Hadley cells, and West African Monsoon (WAM) circulation [6].WAM is a coupled atmosphere-ocean-land system [7] responsible for summer rainfall in the Sahel from May to October [4,6], with the most significant rainfall happening in July-August-September (JAS) [8,9].The WAM flows as a shallow moist surface air layer from the Gulf of Guinea, overlaid by the main northeast trade winds, which blow from the Sahara and which is known as Harmattan.The convergence of the trade winds and WAM form the Intertropical Convergence Zone (ITCZ), which is characterized by low pressure, laden with heat and moisture [10].
Sufficient and reliable observations are necessary to understand the great interannual variability of precipitation in the Sahel, and therefore the evolution of the hydrological regimes.However, in the Sahel, surface and high-altitude observations are very rare and when they do exist, their reliability is always questionable [8].
Weather forecasts are a result of field observation and general circulation models (GCM).However, due to the small amount of observed data and inconveniences of the MCG scale, it is not possible to satisfactorily answer key questions about the interrelation of atmosphere-ocean-land [11,12].As a result, statistical models are more popular for applications that require a high spatial and temporal resolution scale [13].Nonetheless, every climate model has future forecast uncertainty due to common systematic bias [6].
In West Africa, rainfall forecasting is performed by the Prévisions Climatiques Saisonnières en Afrique Soudano-Sahélienne forum (PRESASS, [14,15], formerly known as Prévisions Climatiques Saisonnières en Afrique de l'Ouest (PRESAO).Every year, between April and May, the forum is carried out to elaborate on that year's seasonal forecast.The event is summoned by the African Center of Meteorological Application for Development (ACMAD) and the CRA (Centre Regional de Formation et d'Application en Agrométórologie et Hydrologie Opérationelle-AGRHYMET) [16].For the rainfall forecast, Climate Predictability Tool (CPT) and techniques such as statistical methods, dynamic models, and experts' judgment are used [13].Forecasts are then compared to climate outlooks in international climate centers, and a consensus is reached to communicate the forecast to users.Such outlook is categorical, meaning that it consists of qualitative descriptions such as: above normal, normal, and below normal rainfall probability [13,16].However, currently used PRESASS forum's forecasts and models are not precise enough, and drought and flooding catch authorities and people off guard.That is why it is necessary to develop better models [12,13].
The purpose of this study is to establish a forecasting model for rainfall during the rainy season (May-October) across the Bani and Senegal River basins in West Africa to propose appropriate insight for decision-making processes regarding water management.In order to achieve this objective, ocean-atmospheric variables and linear and non-linear models were used.Statistical techniques such as principal component analysis (PCA) and cluster analysis over Atlantic SST, RHUM, and MSLP variables were used while creating potential forecasting models to find the region with the highest rainfall predictive power.In linear, polynomial, and exponential models, SST was used as the only predicting factor.While in the linear stepwise regression model RHUM, MSLP covariables, Niño1 + 2, Niño3.4,Niño4, Oceanic Niño (ONI), and trans-Niño (TNI) indices were used, aside from SST.As a response variable for models, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) data were used [17].It is worth mentioning that forecasting was performed with the same spatial resolution of PERSIANN-CDR (0.25 • × 0.25 • ) for the 725 pixels that make up the Bani and the Senegal River basins.

Description of the Study Region
Sahel is the vast semi-arid region of Africa separating the Sahara Desert to the north from tropical savannas.From west to east, the Sahel stretches from northern Senegalsouthern Mauritania to Eritrea and northern Ethiopia.This place is home of nearly 130 million people.The main means of livelihood are stockbreeding, fishing, and subsis-tence agriculture.The latter is the most important sector and the main means of livelihood for most of the people who inhabit this region [18].
This study considers only the western part of the Sahel located between 13 • W to 4 • W and 8 • N to 20 • N, comprising the areas of the Bani River basin at Beneny Kegny hydrometric gauge (upper Niger) and the Senegal River basin at Bakel (Figure 1).Most of the basins' area is located in the Sahel (semi-arid climate), while the southern part of the basins has a tropical savanna climate, and the north of the Senegal River basin (center of Mauritania) corresponds to warm desert climate.

Description of the Study Region
Sahel is the vast semi-arid region of Africa separating the Sahara Desert to the north from tropical savannas.From west to east, the Sahel stretches from northern Senegalsouthern Mauritania to Eritrea and northern Ethiopia.This place is home of nearly 130 million people.The main means of livelihood are stockbreeding, fishing, and subsistence agriculture.The latter is the most important sector and the main means of livelihood for most of the people who inhabit this region [18].
This study considers only the western part of the Sahel located between 13°W to 4°W and 8°N to 20°N, comprising the areas of the Bani River basin at Beneny Kegny hydrometric gauge (upper Niger) and the Senegal River basin at Bakel (Figure 1).Most of the basins' area is located in the Sahel (semi-arid climate), while the southern part of the basins has a tropical savanna climate, and the north of the Senegal River basin (center of Mauritania) corresponds to warm desert climate.Delimitation of the basins was performed based on a digital elevation model (DEM) of 3 arcs of a second (0.000833°, ~90 m) of spatial resolution, taken from the Shuttle Radar Topography Mission (SRTM) [19], available at https://srtm.csi.cgiar.org/,accessed on 20 December 2018.In a study conducted by Bâ et al. [20] on the Senegal River, a ~1 km resolution DEM was used to delimit the Senegal basin.This study revealed that the drainage Delimitation of the basins was performed based on a digital elevation model (DEM) of 3 arcs of a second (0.000833 • , ~90 m) of spatial resolution, taken from the Shuttle Radar Topography Mission (SRTM) [19], available at https://srtm.csi.cgiar.org/,accessed on 20 December 2018.In a study conducted by Bâ et al. [20] on the Senegal River, a ~1 km resolution DEM was used to delimit the Senegal basin.This study revealed that the drainage areas up to the Bakel and Kayes hydrometric gauges are larger than those reported by the official map of the Organisation pour la Mise en Valeur du fleuve Sénégal (OMVS).However, the area that was not considered by the OMVS is in the northern part of the watershed located in the Sahara Desert, which does not contribute to the runoff.
The Senegal River basin at Bakel has an approximate area of 440,000 km 2 , its altitude range varies between 15 and 1400 masl, the terrain's slope varies between 0% and 19% and the average slope is 1.08% (very gently sloping).Annual rainfall is about 80 mm year −1 in the northern part of the basin, while at the southern edge of the basin it can reach 2000 mm year −1 [8].According to the GlobCover [22], the land cover in the north of the basin is characterized by a lack of vegetation (bare areas), the center of the basin is covered by brushwood, crop mosaics (Mosaic Cropland) and lush vegetation (grassland, shrubland and forest), while the southern part of the basin is covered by evergreen forests (broadleaved evergreen forest) and semi-deciduous forest [23].

Input Data
In several countries of the region, meteorological data are not free and the available data records lack continuity and reliability [8,24].In some weather stations, daily rainfall has been recorded since the beginning of the last century, but data from recent years are not accessible.The most recent data to which we had access have observations prior to 2015.Analysis shows these data have a suitable quality [8,25].
Recent studies have shown that rainfall data from satellite products provide suitable results in this region [8,25,26].For example, the PERSIANN-CDR product [17] has records from 1983 to present at 0.25 • of spatial resolution and daily frequency.Monthly PERSIANN-CDR data have been evaluated by Bâ et al. [8] in 18 rain gauges located in West Africa and then used the same daily rainfall for flow simulation.The results of the evaluation were satisfactory (R 2 > 0.8) at most gauges.PERSIANN-CDR data can be found at the website of the Center for Hydrometeorology and Remote Sensing (CHRS) at the University of California, Irvine (https://chrsdata.eng.uci.edu/,accessed on 16 September 2018).
For this study, rainfall of 20 rain gauges were used (Figure 1) to evaluate the PERSIANN-CDR data.A total of 456 monthly images of PERSIANN-CDR product between 1983 and 2020 were used and evaluated in the concomitant period.The rainfall data are obtained from the Meteorological Services of Mauritania, Senegal, Guinea, Mali, and Cote d'Ivoire.These data are used only in the validation process of satellite estimated products.
The historical PERSIANN-CDR data are used as the dependent variable in the forecasting models.The basins of the Bani and Senegal Rivers are made up of 725 pixels of 0.25 • × 0.25 • .In addition, data from the Atlantic Ocean, sea surface temperature (SST), mean sea level pressure (MSLP), relative humidity (RHUM), and 5 Pacific indices Niño1 + 2, Niño3.4,Niño4, ONI, and TNI are used as explanatory variables of the models.
SST is the main variable in the rainfall forecast [2,13].In the forecast models, the Tropical Atlantic (70 • W, 20 • E, 20 • S, 40 • N) was used as the predictor variable of the linear, polynomial, and exponential models.In addition, the MSLP and RHUM data (in the same extension) and the El Niño Southern Oscillation (ENSO) indices were included in a multivariate model.These indices are based on the SST anomalies of tropical Pacific regions, Niño1 + 2 (90 , and TNI (Niño1 + 2 and Niño3.4)[27].
The reanalysis data of the SST product of ERA5 were used at 0.25 • × 0.25 • of spatial resolution and daily frequency [28].These data come from two providers.Before September 2007, SST data are from the Group for High-Resolution Sea Surface Temperature (GHRSST) of the National Centers for Environmental Information (NCEI).As of September 2007, data from the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) [29] were used.Data from ERA5: SST, MSLP, and RHUM are available at https://cds.climate.copernicus.eu [28], and the NCAR/UCAR Climate Date Guide El Niño indices are available at https:// climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni[27], accessed on 25 September 2018.
Sea temperature data from 5 buoys are used to validate the SST of the Atlantic Ocean.These data were obtained from The Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) of the National Data Buoy Center, The National Oceanic and Atmospheric Administration (NOAA) [30].Buoy data are available at https://www.ndbc.noaa.gov/, accessed on 26 September 2018.Buoys located near the coast of West Africa that have continuous records were chosen (Figure 1).The assessment, on one hand, of the PERSIANN-CDR and SST products and, on the other hand, of the forecast models, is carried out at point-to-pixel.That is, the rainfall of the rain gauges and the temperature of the buoys are compared with the data of the raster cells where the gauges/buoys are located.

Ocean-WAM Teleconnections
It is known that the Atlantic Ocean is the main source of humidity for West Africa [2,13], but it is necessary to know which region is better tele-connected between ocean-atmospheric variables and WAM.In addition, it is important to find the optimal time lag between the variables and the WAM.In previous studies, authors divide SST using some techniques such as Folland et al. [4] in a study on the Sahel, they grouped pixels of the SST anomalies into 10 • x 10 • cells.Gado et al. [31] and Sittichok et al. [13] decreased the number of SST components of the Atlantic using PCA and canonical correlation analysis (CCA).
Phenomena with a social impact on the region, such as droughts and floods, are related to the tropical Atlantic variability (TAV).The TAV has interactions, particularly with trade wind fluctuations, SST, and precipitation [32].The SST is tele-connected with rainfall that varies between 0, 6, and 12 months depending on the place in the ocean [13,31].In this study, PCA and cluster analysis were used, with the purpose of grouping pixels of the Atlantic SST homogeneous values and finding the region with the best teleconnection between ocean variables and rainfall of the Bani and Senegal River basins.The PCA performs an orthogonal test transformation on the 456 monthly images between 1983 and 2020.The result is a reduced set of synthetic explanatory covariates, called principal components, which are not correlated to each other [33,34].Clustering was applied to group the pixels with maximum homogeneity in each group.The k-means method was used to find the differences between groups [34].The cross-correlation analysis helped to find the optimal lag between the predictors (SST, RHUM, MSLP, El Niño indices) and the PERSIANN-CDR precipitation.

Forecasting Models
Linear and non-linear models are commonly used to find the relationship of predictors or explanatory variables to the response variable [2].The linear, second-order polynomial, stepwise, and exponential models are defined in Equation (1) through Equation ( 4), respectively.
where Ŷ is the forecast precipitation, b 0 , b 1 , . . ., b n are the coefficients of the models and e is Euler's number, SST lagSST , ..., TNI lagTNI are the ocean-atmospheric covariables with their respective lag.The coefficients are obtained by the method of least squares; in the case of the exponential model, they are obtained by trial and error in an iterative process.
Eight covariates were used in the stepwise regression model.The collinearity analysis, measured with the variance inflation factor (VIF) suggests reducing the number of variables to decrease variance inflation and avoid overfitting the model [35].
Rainfall forecasts in the 725 pixels of the study area are calculated using the SST data of the Atlantic regions.Models of Equations ( 1)-( 4) are applied in each of the pixels that make up the basins of the Bani and Senegal Rivers.The results of Equations ( 1), ( 2) and ( 4) of each SST region of the Atlantic are compared with the PERSIANN-CDR value of each pixel (dependent variable).For the stepwise regression model, Equation (3), a combination was made between all the predictors considering the lag of each of them.
Covariables series and the dependent variable (PERSIANN-CDR) were divided into two samples, 70% for models' development and 30% for model validation.This procedure was carried out using an R script (see supplementary materials: https://github.com/lebalcazar/sahel).About 600 iterations have been performed.The parameters of the non-linear model, Equation ( 4), are calibrated by trial error, once these converge, the best model parameters are obtained.In the R script, a maximum of 1000 iterations are restricted to obtain the parameters of the exponential model; however, it was observed that they converge after 100 iterations.

Model Assessment
Once the models are obtained for each pixel, an assessment is performed with the validation sample for each pixel.Next, forecasts rainfall are compared with the observed data using objective criteria such as adjusted coefficient of determination (R 2 adj), Equation ( 5) and the Akaike information criterion (AIC), Equation (6).Models with significant parameters (p-value < 0.05) are chosen, R 2 adj > 0.5 and lower AIC value are chosen, as well as models that have a lag greater than or equal to six months, reasonable time for authorities to take preventive measures [31].
where n is the number of observations in the sample, k is the number of model variables and R 2 is the coefficient of determination, Equation (7).R 2 adj indicates the degree of effectiveness of independent variables in explaining the response variable.Increasing of independent variables escalate the value of the quotient.(n − 1)/(n − k − 1).R 2 is reduced as a function of the increment of variables; therefore, R 2 adj penalizes the addition of coefficients in the model [36].
where k is the number of variables in the model and L is the maximum likelihood value for the estimated model.The AIC proposed by Akaike [37] is used in model selection.This criterion considers the goodness of fit and the complexity of the model, based on the penalty for the number of explanatory variables used.For example, a model with a larger number of explanatory covariates increases the probability of having a better fit, however, this can result in an overfitting of the model and is penalized by the AIC.
where obs i and sim i are, respectively, the observed and simulated variable of the month i, obs and sim are, respectively, observed and simulated mean, and n is the amount of data.R 2 measures the proportion of the variance explained by the model.R 2 range varies between 0 and 1, with 1 being the optimal value, and values greater than 0.50 are considered acceptable [41].
The NSE is used to determine the relative magnitude of the residual variance and the variance of the observations.NSE coefficient varies between −∞ and 1, when NSE is equal to 1, it indicates a perfect simulation of the model.If the NSE equals 0, it indicates that the predictions of the model are as accurate as the mean of the observed data.A negative NSE indicates that the observed mean is a better predictor than the model [40,41].
PBIAS is used to determine how well the model simulates the average magnitudes for the output response of interest.PBIAS is useful for long-term continuous simulations and allows to identify the average bias of the model simulations.PBIAS range varies between −∞ and ∞, 0 is the optimal value.Positive values indicate that the model overestimates the observed rainfall and negative values indicate that the model underestimates the rainfall [39].
RE is the quotient between the absolute error of the simulated rainfall and the observed rainfall.This allows to understand the performance of the model among different responses.In addition, the differences between the observed and simulated values are quantified as relative deviations.This significantly reduces the influence of absolute differences during peaks [40].
MAE measures the error of the values calculated by the model.It is calculated and presented in the same unit of the model; therefore, it is easier to interpret.In addition, it is very useful in long-term continuous simulations [40].MAE usually has a magnitude that is equal to or less than RMSE; however, it gives greater weight to the peaks, so adjustments must be made using the standard deviation of the observations [41].

Results
The most relevant results obtained in this study are presented below.

Validation of Satellite Products
The process begins with the comparison between the monthly PERSIANN-CDR data and the monthly gauged rainfall, followed by the comparison between ERA5 monthly mean temperature and observed monthly mean temperature at 5 Atlantic buoys, Tables 1 and 2, respectively.

Classification of the Atlantic Variables
SST, RHUM, and MSLP Atlantic data were processed (456 monthly raster images of each variable between 1983 and 2020).The PCA and cluster analysis allowed us to divide the Tropical Atlantic into homogeneous regions.The PCA reduced the set of 456 images into two principal components that explain 97.3% of the SST variance, 96.0% of the MSLP variance, and 98.0% of the RHUM variance.The principal components were used in the cluster analysis.The k-means method and the silhouette method suggest that the optimal number of clusters is k = 3 for SST and RHUM and k = 2 for MSLP (Figure 2).Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 22 variance, and 98.0% of the RHUM variance.The principal components were used in the cluster analysis.The k-means method and the silhouette method suggest that the optimal number of clusters is k = 3 for SST and RHUM and k = 2 for MSLP (Figure 2).

Selection of Forecast Models
SST of the 3 Atlantic regions (Figure 2) was used in the linear, polynomial, and exponential models.Cross-correlation analyses showed that SST has a higher correlation with time lags of 5, 10, and 11 months, respectively, for the SST1, SST2, and SST3 regions.Then,

Selection of Forecast Models
SST of the 3 Atlantic regions (Figure 2) was used in the linear, polynomial, and exponential models.Cross-correlation analyses showed that SST has a higher correlation with time lags of 5, 10, and 11 months, respectively, for the SST1, SST2, and SST3 regions.Then, in the stepwise regression model, the covariables of the SST, RHUM, and MSLP regions were used (Figure 2), and the indices of the Pacific Niño1 + 2, Niño3.4,Niño4, ONI, and TNI.
It was observed that the SST is the most influential variable in the WAM and the coefficient of determination between PERSIANN-CDR and SST in a linear model is greater than 0.70.When the models are applied for each pixel, it is observed that the polynomial model reproduces better rainfall in the Bani and Senegal River basins, followed by the stepwise regression model.The highest performance was obtained with the polynomial model and SST3, the north Tropical Atlantic region, and with a lag of 11 months (NSE = ~0.80).In the south of the basins, at Mamou, Labe, Odienne, and Korhogo rain gauges, yields were better with the stepwise regression model with the SST3 region; however, the AIC difference between these models is negligible.On the other hand, in the north of the Senegal basin, no model was able to reproduce rainfall with the selection criteria (p-value < 0.05, R 2 adj > 0.5, and lower AIC).
To compare the results of the stepwise regression model in the three regions, the covariates that contribute to improving the explanation of the variance of the phenomenon are added to the SST.This model provided better results in the SST3 region with an R 2 adj of 0.845; that is, more than 84% of the precipitation variability is explained by the multivariate model.However, to avoid overfitting the model, the VIF analysis [35] suggests using only the SST, RHUM, Niño1 + 2, and TNI covariates.
In the northwest of the Senegal River basin at Bakel, Matam, and Kiffa rain gauges, some pixels were better with the polynomial model and the SST1 region (Gulf of Guinea); however, the lag of these simulations is 5 months, which is less than the objectives set in this research.When comparing the AIC and R 2 adj between the polynomial model with SST1 and SST3 regions, the difference is negligible (Figure 3).Table 3 shows that the polynomial model yields the highest NSE coefficient values, followed by the stepwise regression model, the exponential model, and finally, the simple linear model.In all cases, no model was able to predict rainfall in the northern Senegal River basin.To have the rainfall forecast over the entire area of the basins, restrictions of the selection criteria were removed: p-value < 0.05, R 2 adj > 0.5, AIC less, and lag greater Rainfall forecasts with the polynomial model (Figure 4) adequately reproduce seasonal rainfall of the WAM. Figure 5 shows the spatio-temporal distribution of the rainfall forecast.It shows that it is temporarily distributed between May and October, mainly in the south of the Bani and Senegal basins.In July and August, the rainfall is distributed throughout the basins' area, with a decrease in rainfall as latitude increases.Rainfall forecasts from 1984 to 2020 are presented in Table A1, Appendix A (as complementary material).The average rainfall in the period 1984-2020 is ~690 mm year −1 , the minimum is ~490 mm in 1985, and the maximum is ~830 mm in 2016.

Comparison of the Forecasts with the Reference Rainfall
According to the PRESASS forecast [14], by 2021, summer rainfall in the Sahel and Sudan regions was expected to be equivalent to or higher than normal.In addition, rainfall was predicted to have an early to normal start and a late to normal end.As historical PERSIANN-CDR data are available, these records were used to obtain the normal precipitation (1991-2020) as in PRESASS [15].
The results of this study were compared with those of PRESASS for 2021 as an example.The gamma distribution function was fitted to each PERSIANN-CDR monthly precipitation sample of each pixel to determine the quantiles at three characteristic values of probability of non-exceedance.Figure 6 presents the monthly frequency hyetograph

Comparison of the Forecasts with the Reference Rainfall
According to the PRESASS forecast [14], by 2021, summer rainfall in the Sahel and Sudan regions was expected to be equivalent to or higher than normal.In addition, rainfall was predicted to have an early to normal start and a late to normal end.As historical PERSIANN-CDR data are available, these records were used to obtain the normal precipitation (1991-2020) as in PRESASS [15].
The results of this study were compared with those of PRESASS for 2021 as an example.The gamma distribution function was fitted to each PERSIANN-CDR monthly precipitation sample of each pixel to determine the quantiles at three characteristic values of probability of non-exceedance.Figure 6 presents the monthly frequency hyetograph (MFH) at pixels where a rain gauge is located for the rainy season (May to October) and rainfall of the year 2021.Rainfall quantiles were computed for probabilities of 0.30, 0.50, and 0.70, representing, respectively, under normal, near normal, and above normal.
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 22 At the beginning of the rainy season, and for all regions, forecasts are around normal and almost equivalent to observed rainfall (PERSIANN-CDR).From May to October, forecasts are around normal or over normal in most pixels.In general, the 2021 forecast is consistent with the forecast of PRESASS [14].When comparing the average rainfall of the forecast for the year 2021 with the normal precipitation, 90% of the area corresponds to wet regions (Figure 7).In the south region of the Bani and Senegal River basins and in the northern end of the Senegal River basin, the forecast is classified as around normal.At the beginning of the rainy season, and for all regions, forecasts are around normal and almost equivalent to observed rainfall (PERSIANN-CDR).From May to October, forecasts are around normal or over normal in most pixels.In general, the 2021 forecast is consistent with the forecast of PRESASS [14].
When comparing the average rainfall of the forecast for the year 2021 with the normal precipitation, 90% of the area corresponds to wet regions (Figure 7).In the south region of the Bani and Senegal River basins and in the northern end of the Senegal River basin, the forecast is classified as around normal.Table 4 shows the comparison between the rainfall forecasts (May-October) with the PERSIANN-CDR for the years 2017-2021.It was found that the forecasts (FRC) are lower but close to the observed rainfall (PERSIANN-CDR).The relative error varies between 1.1% and 38%.However, in the northern end of the Senegal River basin (such as Tidjikja, see Table A2, Appendix A), the relative error can exceed 100% because the models do not simulate rainfall well enough in this area.We measure the performance of the forecast model on each pixel, a sample of 4 pixels where the weather stations are located.The errors for the remaining pixels are presented Table 4 shows the comparison between the rainfall forecasts (May-October) with the PERSIANN-CDR for the years 2017-2021.It was found that the forecasts (FRC) are lower but close to the observed rainfall (PERSIANN-CDR).The relative error varies between 1.1% and 38%.However, in the northern end of the Senegal River basin (such as Tidjikja, see Table A2, Appendix A), the relative error can exceed 100% because the models do not simulate rainfall well enough in this area.We measure the performance of the forecast model on each pixel, a sample of 4 pixels where the weather stations are located.The errors for the remaining pixels are presented in Table A2, Appendix A.

Rainfall Distribution
In West Africa, rainfall has a negative gradient related to latitude.In the south, in the upper part of the Bani and Senegal River basins, precipitation is about 2800 mm year −1 , while in the north, it is about 100 mm year −1 [8].The humidity of West Africa is almost entirely caused by WAM.The wet period is from May to October in the south of the basins [4,6], and the rainiest months are July-August-September, in the area [7,8].
Previous studies in the Sahel report several hypotheses to describe the drought that occurred in the last decades [31,42,43].The most significant ones are listed below.Cooling of the SST in the north Tropical Atlantic and sudden warming in the south leads to the migration of the ITCZ further south, causing lower humidity in the region [2,9,32].In addition, the associated events such as El Niño and La Niña influence rainfall patterns of tropical areas [42].
Satellite products were validated with observations in situ.PERSIANN-CDR [17] precipitation and temperature product data were compared with the rain gauge data at pixel-to-point, and the coefficient of determination was calculated.Results are consistent with those reported in previous studies in the same region [8,24,25].It is important to highlight that the SST data were validated with observations on the Atlantic buoys at pixel-to-point.The results were very good, and the R 2 was higher than 0.93.

Atlantic Regions
Statistical methods of PCA and cluster analysis were used to regionalize SST, RHUM, and MSLP of the Tropical Atlantic.Grouping into homogeneous regions optimizes the computing time of the forecasting processes.In recent studies, different techniques have been used to reduce the spatial-temporal dimension of the variables.For instance, Sittichok et al. [13] used PCA methods, canonical correlation, and stepwise regressions between the Atlantic and Pacific to reduce the number of components and select the optimal lag.Folland et al. [4] grouped the 2.5 • × 2.5 • SST into 10 • × 10 • cells to reduce the number of components.Studies carried out by Gado Djibo et al. [2,31] used the method proposed by Sittichok et al. [12,13] to estimate optimal lag.In addition, they used Bayesian algorithms to detect change points for the purpose of combining models with dynamic parameters.

Rainfall Forecasts
The performance of the linear and exponential models evaluated with the NSE criteria provided values in the ranges of (0.509, 0869) and (0.627, 0.888), respectively, for the 20 pixels where rain gauges are located (Table 3).For these pixels, the NSE results of the stepwise regression model were between 0.585 and 0.889.Sittichok et al. [13] also used a stepwise model and obtained an NSE of 0.387 (Table 5).
Rainfall estimates with the polynomial model and the SST of the tropical north Atlantic yielded the best results.NSE varies between 0.629 and 0.905, with a lag of 11 months.Forecasts were better in the southern basins of the Bani and Senegal Rivers.However, in the northern part of the Senegal River basin (desert area), none of the four models was able to reproduce the rainfall with the selection criteria.To have one model for each pixel of this dry region, restrictions on the criteria were removed, and the models with the lowest AIC were selected.It seems that SST alone cannot explain the dynamics of precipitation in the Sahara.This region is characterized by very low annual rainfall (<100 mm).Several factors may cause this permanent drought.The high temperatures mean that the sea humidity does not reach the area.The extremely overheated winds constitute barriers to this arrival of humidity.Other factors should be considered for the seasonal rainfall forecasting in this area.Furthermore, for the pixels of the Bani River basin, the minimum value of NSE was greater than 0.696, and the lowest maximum value was 0.869 for each of the four models.Those values for the pixels of the Senegal River basin were 0.164 and 0.855.
The monthly rainfall forecasts using the polynomial model of each pixel were calculated for the historical period of PERSIANN-CDR.Appendix A (Table A1) presents forecasts for the two basins.Comparison between forecasts and PERSIANN-CDR with their corresponding relative error (RE) are given in Appendix A (Table A2) at 20 pixels where gauges are located.RE is small for all pixels except for the one corresponding to Tidjikja, located in the Mauritanian desert.Overall mean relative error was about 12.6%.
Findings of research show that the SST of the north of the Tropical Atlantic and the SST of the Gulf of Guinea has a strong teleconnection with rainfall at the Sahel.The average NSE was 0.80, and the maximum NSE was 0.926.MAE was ~30 mm month −1 , and the polynomial model was the one with the lowest error.Gado Djibo et al. [31] obtained satisfactory results with the Bayesian method of multiple point change and air temperature (NSE = 0.76, lag = 14 months).In another study, Gado Djibo et al. [2] combined linear models and reported the following results, with air temperature (NSE = 0.53, lag 7 months), with SST (NSE = 0.34, lag = 12 months), RHUM (NSE = 0.52, lag = 8 months).It is worth noting that a non-linear model does not always turn out to be better than a linear model [31].Results from the polynomial model are 26% and 42% better than those reported in previous studies [2,31] and 46% better than the model of [12] (Table 5).

Conclusions
The availability of long series of global satellite-based meteorological products with high spatial and temporal resolution is increasingly facilitating and stimulating the implementation of rainfall forecasting models, particularly in undergauged regions.More so, if one considers the challenges that arise in Africa due to climate change and intensifying rainfall variability.The forecasting models using these satellite datasets can provide valuable up-to-date information useful to decision makers.Millions of people in the Sahel region, who have suffered from the effects of droughts and floods, will be able to benefit from the information provided by these forecasting tools to better anticipate the planning of activities related to water, particularly for agriculture.During the last decades, several authors tried to develop models that would predict whether the rainy season would be wet, normal, or dry in the Sahel region.Statistical seasonal rainfall forecasting models are more often used than physically based models because of their simplicity.
This research dealt with seasonal rainfall forecasting for the Bani and Senegal basins.The Atlantic sea surface temperature (SST), the mean sea level pressure (MSLP), the relative humidity (RHUM), and five El Niño indices were used as explanatory variables, and PERSIANN-CDR rainfall data were used as a dependent variable.
Lineal, polynomial, exponential, and stepwise regression models were developed to forecast rainfall for each of the 725 pixels of the two basins.These models were built using 70% of the satellite datasets available from 1983 to 2020.The remaining 30% was used for model validation.The strategy was to find a model that satisfied the criteria: R2adj > 0.5 and lower AIC value, as well as a model that has a lag (lead time) greater than or equal to six months, a reasonable time for authorities to take preventive measures.
Based on the principal component analysis and cluster analysis, three SST, three RHUM, and two MSLP homogeneous regions were defined.The study revealed that the North Atlantic SST, a region of approximately 11.5 × 106 km 2 (16 • N-40 • N y 9 • W-39 • W), has a better teleconnection with rainfall in the two basins with a lead time of eleven months.The Gulf of Guinea SST also has a suitable teleconnection with rainfall over the region but with a lead time of five months.Finally, all models were built using datasets of the North Atlantic region with a lead time of eleven months.
All four models provided suitable results in all the pixels of the Bani River basin based on the numerical criteria; the smallest NSE value was about 0.696 with the linear model.
For the pixels of the Senegal River basin, each of the four models provided suitable results, but up to latitude 16.5 • N approximately.None of the models was able to forecast rainfall relatively precisely in the far north of the Senegal River basin that corresponds to the Sahara.For this region, other predictors (such as temperature, wind speed, and direction) or other types of models should be considered.
The results of the stepwise regression model are not very different from those of the linear model.This means that the other explanatory variables did not contribute much to explain the phenomenon.It is understandable because SST is the driving force of many of these variables.
The best of the four models for rainfall forecasting in the study area was the secondorder polynomial model.For the Bani River basin, values of NSE were between 0.751 and 0.926, with a mean of about 0.867.However, for the Senegal River basin, these values were 0.133, 0.916, and 0.711, respectively.Moreover, the relative error calculated for the pixels where the rain gauges are located was globally quite low, sometimes even close to zero.Overall, it is easy to say that the polynomial model gives suitable rainfall forecasts.
The statistical models used in this study are easy to apply, and the satellite datasets are accessible to users.Finally, this study aims to make a significant contribution to improving the effectiveness of forecasts more than six months in advance, which is enough time for agricultural planning and decision making.

Acknowledgments:
The authors would like to thank the OMVS and the National Meteorological and Hydrological Services of Cote d'Ivoire, Guinea, Mali, Mauritania, and Senegal for providing some meteorological data.This research has also been partially funded by the National Council for Science and Technology of Mexico (CONACyT) through a doctoral scholarship.Thanks to Mamoudou BA of NOAA and Alin CARSTEANU of IPN for their suggestions.In addition, thanks to the four anonymous reviewers for their contributions.

Conflicts of Interest:
The authors declare no conflict of interest.

Figure 1 .
Figure 1.Location of the basins of the Bani River at Beneny Kegny (upper Niger) and the Senegal River at Bakel on a digital elevation model (DEM).Location of rain gauges (blue dots) and buoys in the Atlantic located off the coast of West Africa (magenta boxes) and the Sahel region (yellow stripe).

Figure 1 .
Figure 1.Location of the basins of the Bani River at Beneny Kegny (upper Niger) and the Senegal River at Bakel on a digital elevation model (DEM).Location of rain gauges (blue dots) and buoys in the Atlantic located off the coast of West Africa (magenta boxes) and the Sahel region (yellow stripe).

22 Figure 3 .
Figure 3. Spatial distribution of models that best reproduce rainfall in each pixel of the Bani and Senegal River basins.

Figure 3 .
Figure 3. Spatial distribution of models that best reproduce rainfall in each pixel of the Bani and Senegal River basins.

Figure 5 .
Figure 5. Spatial-seasonal distribution of rainfall forecast with the polynomial model, sample from the years 2005 to 2020.

Figure 5 .
Figure 5. Spatial-seasonal distribution of rainfall forecast with the polynomial model, sample from the years 2005 to 2020.

Figure 6 .
Figure 6.Comparison between the forecast for 2021 and PERSIANN-CDR monthly rainfall (mm) using the monthly frequency hyetograph (MFH) of the pixel where the rain gauge is located.

Figure 6 .
Figure 6.Comparison between the forecast for 2021 and PERSIANN-CDR monthly rainfall (mm) using the monthly frequency hyetograph (MFH) of the pixel where the rain gauge is located.

22 Figure 7 .
Figure 7. Classification of the seasonal rainfall forecast for 2021 over the Bani River and Senegal River watersheds in terms of wet, normal, and dry.

Figure 7 .
Figure 7. Classification of the seasonal rainfall forecast for 2021 over the Bani River and Senegal River watersheds in terms of wet, normal, and dry.

Table 1 .
Validation of monthly precipitation of the PERSIANN-CDR product with the observed data of 20 rain gauges between 1983 and 2014.

Table 2 .
Validation of the SST ERA5 with monthly mean temperature in 5 Atlantic buoys between 1997 and 2019.

Table 4 .
Validation of the forecasted rainfall (FRC) in a 4-pixel sample with the "observed" rainfall PERSIANN-CDR (CDR) at the point-to-pixel from the last 5 years.

Table 4 .
Validation of the forecasted rainfall (FRC) in a 4-pixel sample with the "observed" rainfall PERSIANN-CDR (CDR) at the point-to-pixel from the last 5 years.

Table 5 .
Comparison of results of the precipitation forecast models of this study with previous studies.

Table A1 .
Monthly rainfall forecasts (mm) in the Bani and Senegal River basins.

Table A2 .
Validation of the forecasts (FRC) with PERSIANN-CDR rainfall (CDR) at pixel-to-point from the last 5 years (complement of Table4).