Climatology and Trend of Severe Drought Events in the State of Sao Paulo, Brazil, during the 20th Century

: Drought is a natural hazard with critical societal and economic consequences to millions of people around the world. In this paper, we present the climatology of severe drought events that occurred during the 20th century in the region of Sao Paulo, Brazil. To account for the effects of rainfall deﬁcit and changes in temperature at a climatic timescale, we chose the Standardized Precipitation Evapotranspiration Index (SPEI) to identify severe droughts over the city of Sao Paulo, and the eastern and central-western regions of the state. Events were identiﬁed using weather station data and European Centre for Medium-Range Weather Forecast (ECMWF) reanalysis data, in order to assess the representation of drought periods in both datasets. Results show that the reanalysis seems suitable to represent the number of events and their mean duration, severity and intensity, but the timing and characteristics of individual events are not well reproduced. The correlation between observation and reanalysis SPEI time series is low to moderate in all cases. A linear trend analysis between 1901 and 2010 shows a tendency of increasing (decreasing) severe drought events in the central and western (eastern) Sao Paulo state, according to observational data. This is in agreement with previous ﬁndings, and the reanalysis presents this same signal. The weakened trend values in the reanalysis may be associated with issues in representing precipitation in this dataset.


Introduction
Drought is a natural hazard that occurs throughout the world and may affect millions of people [1] either by its damaging effects in agriculture [2,3], reduction of water available for consumption and daily use, or problems in electricity generation [4]. It is the consequence of a prolonged period with reduced precipitation that results in water shortage [5]. The lack of precipitation is usually followed by an increase in temperature and reduction of atmospheric humidity. This leads to enhanced evapotranspiration, which aggravates the drying situation and water demand [6].
Normalized drought indexes such as the Standardized Precipitation Index (SPI) [7] and the Standardized Precipitation Evapotranspiration Index (SPEI) [8] have been used in recent years to identify droughts in different regions of Brazil [9][10][11][12][13]. Blain and Brunini [14] performed a comparison between the SPI and other indexes based on the Palmer method [15], concluding that the standardized index is suitable for use in studies of droughts at different timescales in the region. Blain [16] investigated the low frequency variability and trends in a city near Sao Paulo (capital) using the SPI. They found very little influence of the El Niño Southern Oscillation in drought frequency, and no significant trends.
More recently, Pereira et al. [17] used SPI and SPEI to identify an increase in drought events over Sao Paulo state during the first months of rainy season, with serious consequences on agriculture.
The occurrence of drought events in the state of Sao Paulo, located in southeastern Brazil, has profound societal and economic impacts. Sao Paulo is the most densely populated state in the country, with an important agricultural production (corresponding to 20% of the Brazilian agribusiness gross domestic product [18]) heavily based on rain-fed crops. Besides, the state is an important hydro-power generator. Drought events in this area may cause serious water supply crisis for millions of people, such as occurred in the drought of 2014/2015 [19]. A better understanding of drought climatology in the region can be useful to assess the regional predictability and help in mitigation strategies. Although a negative trend for severe droughts has been observed in Sao Paulo in the last decades [20], the occurrence of three severe events in 1990, 2000-2001 and 2012-2015 [21] indicates that it is still a fairly regular phenomenon that needs to be studied in more detail due to its strong impacts on the society.
The study of long-term drought variability and the atmospheric patterns that lead to a lack of precipitation may depend on atmospheric and oceanic reanalyses. Studying droughts over a long time span is important to identify the low frequency atmospheric variability associated with these phenomena in a complex subtropical region [22] and also to assess trends, which determines regional impacts of climatic changes [23].
In this context, the aim of this work is to identify and characterize severe drought episodes that occurred across the state of Sao Paulo and its capital city during the 20th century, analyzing the temporal distribution and trends in the period. The study was conducted using observed data and reanalysis, to assess the representation of drought events in the latter.

Data and Methodology
Drought events result from a prolonged lack of precipitation. Depending on the timescale of these dry periods, they have impact on different water resources: meteorological conditions and agriculture are affected by relatively short timescales, while streamflows and reservoirs start to be affected by precipitation anomalies longer than 6-12 months. The SPI was created to account for this large temporal variation of the dry periods, and to better analyze different impacts produced. Its calculation depends only on the monthly precipitation record from a specific location. This record is fitted to a probability distribution and normalized, so positive (negative) values of SPI indicate values greater (less) than the median precipitation across the location. The greatest strength of this index is the ability to quantify precipitation anomalies in multiple timescales, assessing different hazards such as meteorological droughts (1-2 months), agricultural droughts (1-6 months) or hydrological droughts (6 months or longer) [24]. Drought events are divided into four categories [7], according to SPI values: extreme (SPI ≤ −2.00), severe (−1.50 > SPI > −1.99), moderate (−1.00 > SPI > −1.49) and mild (0 > SPI > −0.99).
SPEI [8] uses a method similar to SPI but instead of using only precipitation data, it estimates the drought by the monthly difference between precipitation and potential evapotranspiration (PET). The main advantage of this index is the inclusion of air temperature in the calculations (through PET), thus enabling studies of temperature and climate change impacts on droughts [25,26]. Monthly water balance (precipitation minus PET) may also be computed for multiple timescales, with the values standardized by fitting the results to a log-logistic probability distribution.
In this study, we used the SPEI to identify severe drought events that occurred in the Sao Paulo state and specifically in the capital, the city of Sao Paulo. The events start in the month when the SPEI is smaller than −1.50 and lasts until the index is again greater than this threshold. The value of −1.50 follows the World Meteorological Organization (WMO) definition [24] and the threshold commonly used in other studies focusing on severe droughts [27,28]. SPEI was calculated using the timescales of 3, 12 and 48 months, and according to each timescale used, the calculated drought indexes received the appropriate designation of SPEI 3, SPEI 12 and SPEI 48. A timescale of 3 months was chosen to represent drought impacts on agriculture such as dryness during the growing season [17] and management practices [29]. The selection of a 12-month timescale aimed to reflect hydrological consequences of drought such as energy production decreasing [29]. The longer timescale, 48 months, was used to assess the occurrence and characteristics of persistent severe drought such as the 2014-2015 event [19].
In order to standardize the index, we have used the log-logistic probability distribution [8]. The entire study period was used for the distribution fitting, because it seems to be the most recommended choice when the datasets present both increasing temperature and precipitation trends [30]. Such positive trends were confirmed in all our datasets and regions of study prior to the SPEI calculation (results not shown). Also, following Vicente-Serrano et al. [8], the fitting of the data to the distribution was later validated from probability-weighted moments.
PET was calculated with the Thorntwaite formulation that was proven adequate to use in the São Paulo state region [31]. Monthly precipitation totals and mean air temperature data for the period 1901 to 2010 were obtained from the Climate Research Unit time-series (CRU) TS 4.01 gridded dataset [32] with horizontal resolution of 0.5 • , and from the European Centre for Medium-Range Weather Forecast (ECMWF) ERA20C reanalysis [33], with 125 km horizontal resolution. The CRU climatology was reconstructed using careful time uniformity tests, and the precipitation in this dataset presents a global good agreement with other rain-gauge-based observations (Global Precipitation Climatology Centre-GPCC, University of Delaware-UDEL) and satellite products (Global Precipitation Climatology Project-GPCP) [34]. ERA20C also shows a good correlation with rain-gauge observations, mainly after the 1930's, and near zero and stable precipitation minus evaporation anomalies [33]. However, previous studies reported that representation of drought periods by the consecutive dry days index presented little to no confidence in ERA20C. The reanalysis overestimates this index from 1901 to 1980 (especially during the decades of 1940-1950) and slightly underestimates it during 1980-1990s [35].
The state of Sao Paulo was divided in two zones of analysis, corresponding to homogeneous regions for precipitation probability distributions [36]: an eastern zone near the South Atlantic Ocean that presents a slightly more well distributed precipitation during the year (due to the influence of the ocean and the climatological displacement of cold fronts), hereafter called ESP (25 • S-22.5 • S; 50 • W-44 • W) and a zone that encompasses the central and western portions of the state (hereafter CWSP-23 • S-20 • S; 53 • W-47 • W). These areas are marked in Figure 1. Atmospheric variables were computed as an areal average over these regions, and the obtained SPEI is considered a mean value for the region.
To study the drought events in the city of Sao Paulo, observed meteorological data from the Instituto de Astronomia, Geofísica e Ciências Atmosféricas (IAG) weather station were obtained. The events were identified from 1933 to 2010, the period when both the station and the ERA20C reanalysis have data for this site. For completeness, the drought events were also assessed using CRU, for a validation of this dataset against the station data.
The coordinates of the IAG station are 23 • 39 S-46 • 37 W. Its precipitation data are a suitable measurement to represent the impacts of drought on the water supply for the population, as it is located within 10 km from the main water reservoirs of the city. Due to grid resolution, the data from reanalyses ERA20C and CRU were obtained from a point 70 kilometers away from these coordinates. This distance is greater than that recommended by WMO for the representativeness of a weather station, but we believe that it will not have a substantial impact on our results because drought events in Sao Paulo are usually associated with larger scale synoptic conditions (displacement and strengthening of the South Atlantic Subtropical High).
Following the identification of events, some indicators were calculated: duration (number of months when the SPEI presented values below the −1.50 threshold), severity (absolute sum of SPEI values during the event) and intensity (ratio between severity and duration).
The correlation between SPEI time series in different datasets was represented by the Spearman coefficient with the significance level (95%) calculated by a t-test. Decadal trends of severe drought occurrences were investigated for the datasets using linear regression analysis. The positive/negative slopes of regression lines (obtained by least square linear fitting) are associated with increasing/decreasing trends. Trend significance was assessed by the nonparametric Mann-Kendall test [37]. To study the drought events in the city of Sao Paulo, observed meteorological data from the Instituto de Astronomia, Geofísica e Ciências Atmosféricas (IAG) weather station were obtained. The events were identified from 1933 to 2010, the period when both the station and the ERA20C reanalysis have data for this site. For completeness, the drought events were also assessed using CRU, for a validation of this dataset against the station data.
The coordinates of the IAG station are 23°39′ S-46°37′ W. Its precipitation data are a suitable measurement to represent the impacts of drought on the water supply for the population, as it is located within 10 km from the main water reservoirs of the city. Due to grid resolution, the data from reanalyses ERA20C and CRU were obtained from a point 70 kilometers away from these coordinates. This distance is greater than that recommended by WMO for the representativeness of a weather station, but we believe that it will not have a substantial impact on our results because drought events in Sao Paulo are usually associated with larger scale synoptic conditions (displacement and strengthening of the South Atlantic Subtropical High).
Following the identification of events, some indicators were calculated: duration (number of months when the SPEI presented values below the -1.50 threshold), severity (absolute sum of SPEI values during the event) and intensity (ratio between severity and duration).
The correlation between SPEI time series in different datasets was represented by the Spearman coefficient with the significance level (95%) calculated by a t-test. Decadal trends of severe drought occurrences were investigated for the datasets using linear regression analysis. The positive/negative slopes of regression lines (obtained by least square linear fitting) are associated with increasing/decreasing trends. Trend significance was assessed by the nonparametric Mann-Kendall test [37].

São Paulo City
According to SPEI 3 obtained from the IAG weather station data, 36 (Table 1), usually with greater SPEI values, as can be seen in the four main severe events (Figure 2d). The time series for both data do not coincide very well, resulting in a low correlation (Table 1).
Main severe drought episodes for observed SPEI 12 occurred in the following decades : 1930, 1940, 1950, 1960, 1990 and 2000 (Figure 2b). In the reanalysis, the drought in the 1930's was longer, more intense and displaced to the end of the decade, while the ones in the beginning of the 1940's, in the middle of 1950's and in the first years of the 2000's were weaker than those recorded by IAG (Figure 2e). The duration and intensity of the 1960's drought, one of the most important events during the period, was well represented by ERA20C. Only the event in the 1990's was more intense in the reanalysis. The drought events of 1950's, 1960's and 2000's in this timescale were also identified by [8], using a different dataset. Again, the number of events was smaller in the reanalysis and the correlation was only moderate (Table 1).
In SPEI 48 (Figure 2c,f) the severe drought of the 1930's was more intense in ERA20C, while the 1940's event intensity and duration were underestimated by the reanalysis. Less intense events in the reanalysis compared to observations also occurred in the 1960's and 1980's. The number of events in ERA20C was half that observed by the weather station, and the time series present a moderate correlation (Table 1).  The SPEI obtained using CRU data is very similar to IAG. These time series present a strong correlation (r = 0.8, significant at the 95% confidence level), and equal number of severe drought events.
Larger differences between CRU and IAG occurred due to severe and extreme wet events, while the dry events are represented in a fairly similar way (Figure 2g-i).
The mean duration and severity of severe drought events in IAG and ERA20C were similar for timescales of 3 and 48 months, with the reanalysis presenting a greater dispersion. For SPEI 12, both indicators show larger values in ERA20C. The mean intensity was slightly underestimated in the reanalysis for all timescales, with a larger variability in the observations for SPEI3 (Figure 3).  ERA20C data do not represent the interdecadal variability of severe drought events, as shown in Figure 4. Regression analysis for the observed time series indicates a negative tendency for decadal occurrence of severe drought in Sao Paulo. The timescales of 12 and 48 months present a more expressive tendency, with statistical significance above 90%, while SPEI 3 has a much weaker tendency during the period ( Table 1). The trend is not well reproduced in the reanalysis: although there is a negative tendency in all timescales, it is greater in SPEI 3, and much smaller in SPEIs 12 and 48. Besides, they are not statistically significant. ERA20C data do not represent the interdecadal variability of severe drought events, as shown in Figure 4. Regression analysis for the observed time series indicates a negative tendency for decadal occurrence of severe drought in Sao Paulo. The timescales of 12 and 48 months present a more expressive tendency, with statistical significance above 90%, while SPEI 3 has a much weaker tendency during the period ( Table 1). The trend is not well reproduced in the reanalysis: although there is a negative tendency in all timescales, it is greater in SPEI 3, and much smaller in SPEIs 12 and 48. Besides, they are not statistically significant.

Eastern São Paulo state (ESP)
ERA20C presents 10 more severe drought events than CRU in the period 1901-2010 (Table 2)

Eastern São Paulo state (ESP)
ERA20C presents 10 more severe drought events than CRU in the period 1901-2010 (Table 2) for SPEI 3 in the ESP region. The important extreme event of the 1960's is displaced to the 1970's (Figure 5a,d) and overall, the time series do not present correlation (Table 2). Table 2. Number of events, Spearman correlation coefficient and trend for observation (IAG) and reanalysis (ERA20C) data in the ESP region, from 1901 to 2010. The trend is represented by the slope of the regression line. * indicates the significant trend (according to MK test) or correlation coefficient (according to t-test) at α = 0.1 and ** indicates the significant trend/coefficient at α = 0.05.

Correlation Coefficient
Trend-ERA20C Trend-CRU For the timescale of 48 months (Figure 5c,f) the events of the 1910's and 1920's appear in the reanalysis with a longer duration and weaker severity than the reanalysis. Severe droughts in the 1930's and 1940's are also present in both datasets, but with distinct severity. Drought events after the 1950's do not reach the severe threshold neither in the observations or reanalysis. ERA20C was able to identify half of the observed events, and the correlation was moderate. Table 2. Number of events, Spearman correlation coefficient and trend for observation (IAG) and reanalysis (ERA20C) data in the ESP region, from 1901 to 2010. The trend is represented by the slope of the regression line. * indicates the significant trend (according to MK test) or correlation coefficient (according to t-test) at α = 0.1 and ** indicates the significant trend/coefficient at α = 0.05.

Number of Events-ERA20C
Number of Events-CRU

Correlation Coefficient
Trend-ERA20C Trend-CRU When considering SPEI 12 (Figure 5b,e), the severe droughts of the 1910's and 1920's were present in the reanalysis, but more as short intense episodes, while the observation shows more long lasting events. Droughts in the 1930's, 1940's and 2000's are fairly well represented in ERA20C, while the event of the 1980's appears displaced. The extreme event of the 1960's occurred in the similar period in observation and reanalysis, but the latter underestimated the severity. The number of events was the same in ERA20C and CRU in this case, but the correlation, though statistically significant, was still low ( Table 2).
For the timescale of 48 months (Figure 5c,f) the events of the 1910's and 1920's appear in the reanalysis with a longer duration and weaker severity than the reanalysis. Severe droughts in the 1930's and 1940's are also present in both datasets, but with distinct severity. Drought events after the 1950's do not reach the severe threshold neither in the observations or reanalysis. ERA20C was able to identify half of the observed events, and the correlation was moderate.
Duration, severity and intensity ( Figure 6) are similar in both datasets for SPEI 3 and 12. For SPEI 48, all indicators show a smaller median in the reanalysis. The dispersion is greater in reanalysis for SPEI 3, and the opposite occurs for SPEI 12. Duration, severity and intensity ( Figure 6) are similar in both datasets for SPEI 3 and 12. For SPEI 48, all indicators show a smaller median in the reanalysis. The dispersion is greater in reanalysis for SPEI 3, and the opposite occurs for SPEI 12.
The interdecadal variability of severe droughts (Figure 7) in CRU shows a small and not statistically significant negative tendency for SPEI 12, and an even smaller tendency for other timescales. ERA20C overestimates the negative tendency for SPEI 3 ( Table 2). The interdecadal variability of severe droughts (Figure 7) in CRU shows a small and not statistically significant negative tendency for SPEI 12, and an even smaller tendency for other timescales. ERA20C overestimates the negative tendency for SPEI 3 ( Table 2).

Central western São Paulo State (CWSP)
For this region, the most important severe drought events in SPEI 3 calculated with CRU data occurred in the 1960's and 1980's (Figure 8a). The two events of the 1960's were fairly well represented in the reanalysis, though displaced in time, but ERA20C was not able to represent the 1980's event (Figure 8d). The reanalysis time series presents a few more events than observed, and the correlation

Central western São Paulo State (CWSP)
For this region, the most important severe drought events in SPEI 3 calculated with CRU data occurred in the 1960's and 1980's ( Figure 8a). The two events of the 1960's were fairly well represented in the reanalysis, though displaced in time, but ERA20C was not able to represent the 1980's event (Figure 8d). The reanalysis time series presents a few more events than observed, and the correlation between the series in two data is nearly null (Table 3).   In observed SPEI 12 (Figure 8b) are also evident the severe droughts in the 1960's (that also occur in the reanalysis- Figure 8e, but with weaker intensity) and in the 1980's (that in the reanalysis may be represented by the weaker events in the 90's). The correlation coefficient is again very low, but both series present a similar number of events (Table 3).
Nearly the same number of events was detected in both datasets also in SPEI 48, but the distribution of events in time is very different (Figure 8c,f), resulting again in a low correlation coefficient (Table 3). A long and intense drought is present in the reanalysis around 1940, but it is not recorded in observations. On the other hand, the severe droughts of the 1950's, 1960's and 1970's are weaker in ERA20C. The drought in the first years of the 2000's is overestimated. It is worth noticing that in all timescales the last years of CRU observations show positive SPEI values, while the ERA20C maintains an overestimated drought period until the last available year.
The mean characteristics, and even the dispersion, of severe droughts in SPEI 3 and 12 are very similar in both datasets (Figure 9). In SPEI 48, the observed duration and severity are slightly larger, and for all three indicators CRU presents a higher dispersion than the reanalysis.  Figure 5, but for the CWSP region. Figure 9. As in Figure 6, but for the CWSP region. It is evident in the time series of events per decade according to SPEI 3 and 12 that severe droughts became more frequent over CWSP throughout the 20th century ( Figure 10). The trend is confirmed by the regression coefficients (Table 3). Reanalysis captures this positive tendency, but with smaller magnitude and no statistical significance. For SPEI 48, the observed trend is small and not significant; in this case, the trend is almost null in the reanalysis. The mean characteristics, and even the dispersion, of severe droughts in SPEI 3 and 12 are very similar in both datasets ( Figure 9). In SPEI 48, the observed duration and severity are slightly larger, and for all three indicators CRU presents a higher dispersion than the reanalysis.  Figure 7, but for the CWSP region.

Comments
Differences between the SPEI time series obtained from ERA20C and CRU (Figures 2, 5 and 8) might occur due to the representation of basic atmospheric variables in the reanalysis. In order to discuss this, Figure 11 presents the annual mean of accumulated precipitation and air temperature in both datasets. Comparing Figure 11a with Figure 2e,f,h,i indicates that the subestimation of a severe drought event in the 1950s in Sao Paulo city by ERA20C may be associated with overestimated precipitation and lower temperatures during this period, compared to CRU. However, the fact that drought was overestimated in ERA20C during the 1930's and 1980's, when the reanalysis also presented more precipitation and even lower mean temperatures, suggests that this relationship is not strongly direct. Similar behavior was observed over the ESP region. Over the CWSP region, the annual mean temperature in ERA20C is systematically colder than in CRU (Figure 11d), so differences in the SPEI should be related to precipitation. Indeed, it can be seen that ERA20C overestimation in severe events of the 1940's and 2000's (Figure 8b,c,e,f) coincide with less precipitation in the reanalysis (Figure 11c). As in the case of ESP, the relationships between these variables and severe drought events are not so obvious, suggesting that other effects play a role on the determination of the index. One of these effects may be the wind speed near the surface. Wohland et al. [38] shows that for the entire period of reanalysis, ERA20C presents a positive trend of 10-m wind speed in the region of Sao Paulo, while the observations indicate a negative trend. This overestimation in the wind speed has consequences in the potential evapotranspiration calculation, leading to differences in SPEI. SPEI calculation using evapotranspiration equations that accounts for the effects of wind speed, and an evaluation of local representation of wind speed by the reanalysis are planned for a future paper.  Figure 11 shows that there is a visible discrepancy between the variables' time series. Most notably, a large difference can be seen between the CRU and ERA20C mean temperature for region CWSP (Figure 11d). This difference clearly impacts the SPEI calculation because of the PET formula chosen. This leads us to ask whether at least one of these datasets -and if so, which one of them -  Figure 11 shows that there is a visible discrepancy between the variables' time series. Most notably, a large difference can be seen between the CRU and ERA20C mean temperature for region CWSP (Figure 11d). This difference clearly impacts the SPEI calculation because of the PET formula chosen. This leads us to ask whether at least one of these datasets-and if so, which one of them-really represents the actual climate conditions in the studied area, and therefore renders reliable SPEI values. For São Paulo city, correlation is high between CRU and IAG (Section 3.1), showing that CRU is representative for this location.
Regarding the regions ESP and CWSP, the Spearman correlation coefficient was calculated between the datasets and data obtained from weather stations located in each region. Observed data are scarce over these regions, so we had only six  Table 3 presents the coefficients, and it can be seen that a high correlation occurs between CRU and the stations, especially for mean air temperature. In addition, it was seen (results not shown) that the mean temperature deviation between CRU and the stations is smaller than between ERA20C and the stations, indicating that ERA20C has indeed a systematic significant cold bias in the region. Thus, CRU is a suitable dataset for representing the climate and for SPEI calculation in both regions. The correlation for ERA20C is lower, as expected, but still moderate to high in all cases except for the Votuporanga station.
As the analyses presented in Sections 3.2 and 3.3 were developed using SPEI values obtained from an areal average of meteorological variables, it is necessary to address the question: are these average index values representative of the droughts over the study region? We can have an indication of this by computing the Spearman correlation coefficient between the SPEI areal average value and that calculated from time series obtained in single points inside the areas (Table 4). For consistency, both indexes were calculated using CRU and ERA20C data. The points chosen were the same as the weather stations (described above). For the CWSP region, correlation coefficients calculated from CRU are moderate to high, with values varying from 0.57 to 0.75. Higher correlations were observed for the ESP stations, and this may occur due to the fact that the precipitation regime is linked to more organized synoptic systems (like cyclones and anticyclones), so the drought conditions are better represented by the reanalyses than in the case of rainfall from isolated convection and associated with local factors, as is more common in the central and western parts of the state. In ESP, higher correlations occurred for SPEI 3, indicating that droughts in this timescale are more generalized in space than longer events. Overall, the average SPEI value gives acceptable information for the whole analyzed area. This is also true for ERA20C, for which all correlations between area average and single point series are equal to or higher than 0.80. The better performance of ERA20C in comparison to CRU was expected because the former dataset has a coarser resolution, so point values represent larger areas in ERA20C than in CRU. The areal average SPEI procedure has also been successfully applied for large-scale drought studies in Europe [39][40][41], and is now once more validated by these results.
Some trends found in the present work corroborate previous research regarding droughts in Sao Paulo. The tendency of increasing (decreasing) severe droughts per decade in CWSP (ESP) during the 20th century found here in is agreement with [42] who used the Palmer Drought Severity Index to show a drying tendency over CWSP (and most of the central Brazil) from 1950 to 2008. The similarity between the results found in this previous work and in the present work suggests that different trends in both regions do not arise from inhomogeneities in the ERA20C and CRU precipitation and temperature fields. According to the Brazilian National Institute of Meteorology, weather station data were very scarce over a great part of CWSP until the early 2000s, and this lack of data certainly affects the datasets. However, observations of the same trends in ERA20C, in CRU TS4.0, in CRU TS3.0 [27] and in constructed time series with various data sources [43] is an indication that this is not an artifact.
This increasing tendency is expected to worsen over the next century, as soil moisture is predicted to decrease [44]. An increasing frequency of droughts in the north and northeastern regions of Sao Paulo specifically during the crop-growing season (austral spring) was also identified [17], while in the southern region the extreme rainy events had a positive tendency. Rainfall amounts are decreasing in CWSP during the dry period (austral winter) as well [45].
The positive (negative) trend of severe droughts in CWSP (in the eastern coast of Sao Paulo) is again in agreement with the results of [27]. They showed that these severe droughts became more frequent in the region throughout the 20th century. Also, during this period, the area affected by these events is getting larger all over the world.
To the authors' best knowledge, there are no previous works assessing the representation of severe drought events in reanalyses. Underestimation of the positive drought tendency in CWSP and overestimation of negative tendency in ESP may be related to the fact that ERA20C tends to show more precipitation than observed in Sao Paulo [35,46].

Conclusions
In this study we presented the climatology of severe drought events in the state of Sao Paulo. The events were identified using the SPEI (in the timescales of 3, 12 and 48 months), for two observation datasets (IAG and CRU) and one reanalysis (ERA20C). The number of events, their interdecadal tendency and characteristics such as duration and severity were assessed and compared between observation and reanalysis for three areas: the metropolitan area of Sao Paulo, the eastern sector of the state and the central-western region. A summary of results for each region follows: (1) For the city of Sao Paulo, in the period of 1933-2010, ERA20C presented a smaller number of events than the observation, and with a slightly underestimated mean intensity. A negative trend with significance above 90% was seen in the observed data for SPEI in 12 and 48 months, but the reanalysis was not able to reproduce the magnitude of this trend.
(2) In the ESP region, the SPEI3 (48) presented more (less) cases in the reanalysis, with a low to moderate correlation between the time series. The mean duration, severity and intensity of the events were well reproduced by ERA20C for SPEI 3 and 12, and underestimated for SPEI48. The frequency of observed events presented a small negative trend through the 20th century, which is present also in the reanalysis.
(3) The CWSP region also presented low correlation between the observed and reanalysis time series, but the number of cases was almost the same for SPEI 12 and 48. Also the duration, severity and intensity were similar in both datasets for the three timescales. The trend analysis showed an increase in severe droughts throughout the 20th century (for SPEI 3 and 12). The reanalysis also presented a positive trend, but with smaller magnitude.
Overall, the ERA20C reanalysis seems to be suitable for representing severe drought events in Sao Paulo. The mean characteristics are well reproduced, the number of events in the period is similar and the trends are present, though smaller than in the observation. This may be due to the documented overestimation of precipitation in this reanalysis over the considered region.