Machine Learning Identification of Attributes and Predictors for a Flash Drought in Eastern Australia

: Flash droughts (FDs) are natural disasters that strike suddenly and intensify quickly. They occur almost anywhere, anytime of the year, and can have severe socio-economic, health and environmental impacts. This study focuses on a recent FD that began in the cool season of the Upper Hunter region of Eastern Australia, an important energy and agricultural local and global exporter that is both flood-and drought-prone. Here, the authors investigate the FD that started abruptly in May 2023 and extended to October 2023. The FD followed floods in November 2021 and much above-average May–October 2022 rainfall. Eight machine learning (ML) regression techniques were applied to the 60 May–October periods from 1963–2022, using a rolling windows attribution search from 45 possible climate drivers, both individually and in combination. The six most prominent climate drivers, and likely predictors, provide an understanding of the major contributors to the FD. Next, the 1963–2022 data were divided into two shorter timespans, 1963–1992 and 1993–2022, generally accepted as representing the early and accelerated global warming periods, respectively. The key attributes were markedly different for the two timespans. These differences are readily explained by the impacts of global warming on hemispheric and synoptic-scale atmospheric circulations.


Introduction
A flash drought (FD) occurred in Eastern Australia between May and October 2023 after record rainfall in the same period in 2022.At the end of April 2023, most of the state of New South Wales in southeast Australia showed no indicators of an impending agricultural drought.As defined by the New South Wales Department of Primary Industry [1], these factors include decreased rainfall, increased heat and reduced plant growth (Figure 1a).The USA National Oceanic and Atmospheric Administration (NOAA) defines a FD as "the rapid onset or intensification of drought that develops from a period of low precipitation accompanied by higher-than-average temperatures."The higher temperatures increase both the evaporation of water from the soil and transpiration from plants (evapotranspiration), further lowering soil moisture, which decreases rapidly as drought conditions continue.FDs can occur at any time of year in Australia [2].Although FDs mostly occur in summer and autumn in Australia, they can also occur in winter in southeast and southwest Australia [3].In May 2023, there was an abrupt decrease in rainfall over southeast Australia, which continued until November in the Upper Hunter region (Figure 1b).The consensus is that the primary contribution to FD is precipitation deficit, while evapotranspiration intensifies it [2,[4][5][6].The focus region of this study is the Upper Hunter region of Eastern Australia, which experienced a FD in the period May to October 2023, as defined by the New South Wales state Department of Primary Industry (NSW DPI; Figure 1b).In marked contrast, from May to October 2022, the Upper Hunter region, including the towns of Aberdeen, The potential for applying current and future applications of Machine Learning (ML) techniques to FD is discussed in the review by Tyagi et al. [5].A key feature of ML is the ability to deal with multi-source data, non-linearity, multicollinearity and non-stationarity by identifying regional attributes [5].A recent study by the co-authors was the detection of attributes for the increasingly higher summer maximum temperatures in western Sydney compared with coastal Sydney [9].
Here, the authors concentrate on rainfall variability and use ML techniques on the FD data from May to October 2023 in the Upper Hunter region to identify the most important climate drivers (attributes) that can be employed as predictors of rapid changes in rainfall associated with the FD.This FD varies in characteristics from a typical FD in the literature review of Tyagi et al. [5].It was generated by a severe precipitation deficiency early in the cool season (May to September) and enhanced by record warm temperatures within the same cool season period (May to October).This differs from the warm season noted as the most common time for FD occurrence by Tyagi et al. [5].Furthermore, climate drivers such as ENSO and IOD were not applicable at the start of the rapid decline in rainfall in May.However, ENSO did become a factor as the FD developed during the cool season, when lower rainfall months can first appear in an El Niño phase of ENSO.Following the Materials and Methods section, the Results section starts with a rainfall time series for the Upper Hunter accompanied by statistical significance testing of the May to October rainfall mean and variance of the pre-and accelerated global warming periods in the data.Next, the dominant attributes of the FD are assessed by ML techniques from a broad range of potential attributes.The Discussion section draws implications on the validity and limitations of the ML techniques used in this study, with further explanations drawing on observed tropospheric circulation changes.

Data Sources
May to October total monthly rainfall data from 1963-2023, representing the Upper Hunter region of the state of New South Wales (NSW) in Eastern Australia, were obtained from the Australian Bureau of Meteorology rainfall for Bunnan, Aberdeen, Scone, Muswellbrook and Rouchel Brook, as indicated in Figure 1.Missing Aberdeen values were substituted by Muswellbrook, the closest station.Missing values for Scone were substituted by values from the closest station, Jerrys Plains, whereas Rouchel Brook and Bunnan had complete data.All precipitation data are available online at: http: //www.bom.gov.au/climate/data/(accessed on 7 March 2024).
To detect statistically significant differences in the dataset, the authors perform permutation testing.To assess the impacts of accelerated GW on May to October Upper Hunter precipitation, the dataset was split into two 30-year periods, 1963-1992 and 1993-2022, with the second period aligning with accelerated global warming (GW) from the 1990s [10][11][12][13] and accelerated GW impacts in southeast Australia [9,14].Hence, the data for the entire period are non-stationary.Permutation testing of data (without replacement) between two different time periods does not require the data time series to represent a stationary distribution or rely on other underlying assumptions of a parametric distribution [15,16].Permutation testing was performed on differences in the means and the variances between the two periods.

Methods
Both linear and non-linear ML techniques were used to identify the attributes (for possible predictors) that explain trends found in the May-October precipitation time series for the Upper Hunter region.The predictor variables were considered individually as climate drivers in addition to two-way interactions (scalar multiplication between climate drivers), which are intended to account for the compound influence of drivers.Greater than two-way interactions were not considered because they are time-prohibitive.The linear method used was linear regression with both forward and backward selection.Two nonlinear methods, Support Vector Regression (SVR) and Random Forests (RF), were employed, both with forward and backward selection, in addition to using both radial basis (RBF) and polynomials (Poly) as the kernel function.Therefore, a total of eight different methods were considered.These methods are explained in more detail in previous studies [9,17,19].Each of the eight methods used a training/testing split of 10 for rolling windows, where the training window moves forward by one year for each fitted model.Hence, the first training/testing period is 1963-1972/1973-1982, followed by 1964-1973/1974-1983

Precipitation Time Series
The time series of the 60-year period of May to October rainfall from 1963-2022, for

Precipitation Time Series
The time series of the 60-year period of May to October rainfall from 1963-2022, for the Upper Hunter region using the observations described in Data Section 2.1, is shown in Figure 3a.The years 2022 and 2023 are highlighted by red boxes.There is a pronounced decrease from well above the 95th percentile (and the second highest recorded in the entire 61-year period) in May to October 2022, down to the 10th percentile in May to October 2023.However, FDs are initiated in a matter of months rather than on a time scale of a year or longer.Notably, in the two months of March and April 2023, the Upper Hunter experienced average to above-average rainfall (Deciles 4-7 and 8-9, respectively).April and May were then followed by Decile 1 rainfall from June until October.Other regions in the state of NSW experienced extreme short-duration rainfall and flash floods in November and December 2023, but at the end of December 2023, the Upper Hunter continued to experience intense drought conditions (Figure 3b).

Box Plots
The May to October rainfall totals were converted into box plots to illustrate the data distribution for the time periods 1963-1992 and 1993-2022.Figure 4 (left panel) shows that the mean May to October rainfall is similar for both periods.However, there is a clear indication of a significantly larger variance from May to October of 1993-2022, The Upper Hunter percentile plots in Figure 3a are calculated by totalling the May to October precipitation for the five observing sites, A, S, B, M and RB, shown in Figure 1b-e.They reveal that although there are three major decreases in May-October precipitation, occurring in 1993, 2017 and 2023, the 2023 May-October period was chosen.Following Decile 10 May-October 2022 precipitation (Figure 3a), it also coincided with the dramatic increase in May-October maximum temperature in 2023 to Decile 10, immediately after Decile 1 May-October 2022 maximum temperature (Figure 3c).The mean maximum temperature in Figure 3c is based on the daily maximum temperatures for Scone, as indicated in Figure 1b-e.

Box Plots
The May to October rainfall totals were converted into box plots to illustrate the data distribution for the time periods 1963-1992 and 1993-2022.Figure 4 (left panel) shows that the mean May to October rainfall is similar for both periods.However, there is a clear indication of a significantly larger variance from May to October of 1993-2022, as shown in Figure 4 (right panel).

Permutation Testing
Permutation testing was performed to detect statistically significant differences in the means of the May-October total rainfall time series between the two climate periods 1963-1992 and 1993-2022.As expected from Figure 4, there is no significant difference between the means of the two periods (p-value = 0.91), whereas the difference in the variances is highly significant (p-value = 0.002).

Machine Learning Attribution Results
Attribute selection was conducted for the entire 60-year May-October period 1963-2022.For identification of each climate driver and its index used in Tables 1 and 2, see Table 3.The rolling window training/test period split begins at 10 years each, with the training period moving forward in time by one year for each model trained, which results in a total of 41 training/test periods.This training/testing period split was selected because the training and testing occur in the consecutive time periods typically designated as preand post-accelerated GW [10].In Table 2a, the mean percentage of the rolling windows for all attributes using the eight different methods is shown in column 10.The six most prominent attributes are highlighted in yellow.SSTs in the equatorial central Pacific Ocean (area Niño3.4)dominate singularly and in combination with GlobalSSTA, the SOI and the IPO.In Table 2a, there is only one dominant attribute, IOD*SAM, that includes SAM for the whole period 1963-2022.However, it is well known that increasingly positive SAM in recent decades has resulted in the circumpolar midlatitude westerlies contracting poleward.Consequently, decreased frontal system rain in the late winter-spring of the southern half of Australia has occurred [20,21].Therefore, the authors also ran the attribute selection over the two periods 1963-1992 (Table 2b) and 1993-2022 (Table 2c), separately.However, SAM did not appear singularly or in combination as a dominant attribute in the 1993-2022 period at all (Table 3).It did appear singularly and in combination as the top four dominant attributes in the earlier 1963-1992 period (Table 3).Those four leading attributes are SAM, GlobalT*SAM, Niño3.4*SAM and SAM*TSSSTA.This feature, in addition to the fact that all predictors are selected infrequently, as indicated by the mean percentages, is explored further in the following Discussion section.In other studies, conducted by the authors, Support Vector Regression has outperformed Random Forest regression [18].We also note that in this study, two of the SVR schemes also did well, even if not as well as Random Forest Backward selection.The relative performance of the ML schemes supports our comments that, in general, it is not known beforehand what ML scheme might perform best, so employing a wide selection of ML methods is the accepted best approach.Note that the ML methods employed here are statistical regression schemes, i.e., they are not numerical climate prediction models.The ML techniques used here determine the dominant attributes to detect the importance of contributions from the wide range of available climate drivers.The key climate drivers detected by the ML schemes can be used to produce ML prediction models, as shown in previous studies (e.g., [17,22]).

Discussion
The most dominant Pacific Ocean and atmospheric attributes, and hence the likely predictors, of May-October Upper Hunter precipitation over the years 1963-2022 seem to be Niño3.4*IPO,Niño3.4*SOI,GlobalSSTA*Niño3.4,Niño3.4,and SOI*IPO.These predictors indicate the dominance of the Walker Circulation combined with increasing central eastern Pacific Ocean equatorial SSTs.A strengthening of the Indian Ocean SST gradient represented by the IOD has also been linked to a stronger Walker Circulation [23].The IOD*SAM attribute, also found as a likely predictor over 1963-2022, may be linked to the positive SAM and the deep, moist easterly winds combining with anomalous moist SSTs off the northwest coast of Australia during the spring La Niña months of the negative IOD in both La Niña episodes of 2010-12 and 2020-2022.It is noted that La Niña is not a strong indicator of wet conditions, and El Niño is not a strong indicator of dry conditions in some parts of southeast Australia [24], particularly the coastal areas (e.g., [25,26]).However, there are factors contributing to the widespread heavy rain events in most of southeast Australia during November and the summer months of the current El Niño, which can now be discussed in terms of the ML findings and the observed tropospheric circulation changes.
As mentioned above, SAM was a far more dominant attribute during the initial period of 1963-1992 (Table 3).In the later period, 1993-2022, the disappearance of SAM as a predictor can be explained by the observed changes in the SH tropospheric circulation that occurred after the 1990s.Prior to the 1990s, negative 700 hPa anomalies show that the circumpolar westerly winds favoured rain-producing frontal and low pressure systems developed at low mid-latitude and subtropical latitudes across the southern half of Australia and the adjacent Indian Ocean in the cool season months of May-October (Figure 5a).Slight north/south changes in this latitudinal atmospheric pattern imply slight corresponding negative/positive variations in the value of the SAM index.However, the SAM has been trending positive since the 1960s owing to the poleward contraction of the mid-latitude westerly wind belt and the increasing size of the ozone "hole" each spring [27][28][29][30].
Since the 1990s, ozone levels have been recovering [31], while GW has accelerated [10].As a result, poleward contracted westerly winds are dominated by negative geopotential heights around the Amundsen Sea Low in western SH longitudes.In contrast, there are positive geopotential heights at low mid-latitudes in eastern SH longitudes, particularly in the Australia-New Zealand region (Figure 5b).This pressure imbalance and change in polarity of the geopotential heights around the mid-to-high latitudes of the SH have led to a positive SAM trend in the cool season [31], and the positive pressure anomalies distributed across the southern part of the Australian continent have resulted in the FD of May-October 2023 (Figure 5c).However, the pressure imbalance around the SH with negative geopotential heights in the western SH, partially cancelling out the strongly positive geopotential heights in the eastern SH from the Indian Ocean to the east of New Zealand, explains the absence of SAM attributes since the 1990s.The pressure imbalance is particularly noticeable in November 2023 around the SH mid-latitudes, as shown in Figure 5d.In November 2023, on the Australian synoptic scale, atmospheric circulation anomalies resulted in a persistent low-pressure trough at 200 hPa linked to Antarctic low pressure and over Australia down through the levels at 500 hPa and 850 hPa (Figure 6a-c), providing a deep, moist layer through southeast Australia, which resulted in outbreaks of heavy rain-producing thunderstorms and localized flooding.Anomalous moisture is evident at 300 hPa in southeast and southwest Australia during November and December 2023 (Figure 6d).The upper trough and subsequent low-pressure development through the middle and low atmospheric levels over Australia were instigated by a branch of the subtropical jet around the apex of the upper trough over Australia into the last week of November.This Antarctic-linked upper low-pressure trough over southeast Australia was slow-moving, trapped between the two anomalous high-pressure centres south of Australia and east of New Zealand (Figure 5d).However, despite record-breaking November rainfall from thunderstorms in parts of inland Eastern Australia, it has not occurred in the Upper Hunter, which remained in drought in late December 2023 (Figure 3b).Another feature of the ML predictors is the change from those involving the climate driver Niño3.4 before the 1990s to those involving the PMM since the 1990s (Table 3).A locked-phase relationship between PMM and the ENSO lifecycle, especially during strong El Niño events such as 1982, 1997, 2015 and now 2023, suggests a close linkage between ENSO and PMM variability on seasonal to interannual timescales, which is an ongoing area of research [32].Furthermore, a negative IPO plays a role in the stronger impact of PMM on central Pacific-ENSO since the 1990s [33], requiring that the modulation effects of IPO be considered in understanding the extratropical-tropical climatic connection and ENSO spatial diversity.Note that PMM, IPO and PMM*IPO were all found to be dominant predictors for the more recent period 1993-2023 (Table 3).Another feature of the ML predictors is the change from those involving the climate driver Niño3.4 before the 1990s to those involving the PMM since the 1990s (Table 3).A locked-phase relationship between PMM and the ENSO lifecycle, especially during strong El Niño events such as 1982, 1997, 2015 and now 2023, suggests a close linkage between ENSO and PMM variability on seasonal to interannual timescales, which is an ongoing area of research [32].Furthermore, a negative IPO plays a role in the stronger impact of PMM on central Pacific-ENSO since the 1990s [33], requiring that the modulation effects of IPO be considered in understanding the extratropical-tropical climatic connection and ENSO spatial diversity.Note that PMM, IPO and PMM*IPO were all found to be dominant predictors for the more recent period 1993-2023 (Table 3).

Conclusions
As the global climate warms, flash droughts are becoming more frequent and more severe.This is particularly true of Australia, which has been identified by the IPCC as

16 Figure 2 .
Figure 2. Schematic diagram of the ML training/test method.W = the training window; h = the test window; S = the size of the rolling window step, and N is the total number of training/test windows.In this 60-year study, which starts in 1963, W = 10, h = 10, S = 1 and N = 41.

Figure 2 .
Figure 2. Schematic diagram of the ML training/test method.W = the training window; h = the test window; S = the size of the rolling window step, and N is the total number of training/test windows.In this 60-year study, which starts in 1963, W = 10, h = 10, S = 1 and N = 41.

Climate 2024 , 16 Figure 3 .
Figure 3.Total Precipitation, maximum temperature time series 1963-2023 and drought status in December 2023.(a) Total precipitation time series, May-October 1963-2023, for Aberdeen-A, Bunnan-B, Muswellbrook-M, Rouchel Brook-RB and Scone-S.Solid black line is total May-October precipitation 1963-2022 for all stations.Dashed lines are 5th and 95th percentiles (red), 10th and 90th percentiles (orange), 15th and 85th percentiles (light blue), 20th and 80th percentiles (brown), 25th and 75th percentiles (dark blue), and solid black line is the median.Open red squares show May-October 2022 is Decile 10 (very much above average) and May-October 2023 is Decile 1 (very much below average), (b) mean TMax time series May-October 1963-2023 are Upper Hunter daily observations for Scone.Dashed lines are 5th and 95th percentiles (bottom and top red), 10th and 90th percentiles (bottom and top orange), 15th and 85th percentiles (bottom and top light blue), 20th and 80th percentiles (bottom and top brown), 25th and 75th percentiles (bottom and top dark blue), and the horizontal solid black line is the median.Open red square shows May-October 2022 is Decile 1 (very much below average), and May-October 2023 is Decile 10 (and highest recorded), (c) New South Wales DPI combined drought indicator, 31 December 2023, shows the Upper Hunter experiencing drought with some areas in intense drought.

Figure 3 .
Figure 3.Total Precipitation, maximum temperature time series 1963-2023 and drought status in December 2023.(a) Total precipitation time series, May-October 1963-2023, for Aberdeen-A, Bunnan-B, Muswellbrook-M, Rouchel Brook-RB and Scone-S.Solid black line is total May-October precipitation 1963-2022 for all stations.Dashed lines are 5th and 95th percentiles (red), 10th and 90th percentiles (orange), 15th and 85th percentiles (light blue), 20th and 80th percentiles (brown), 25th and 75th percentiles (dark blue), and solid black line is the median.Open red squares show May-October 2022 is Decile 10 (very much above average) and May-October 2023 is Decile 1 (very much below average), (b) mean TMax time series May-October 1963-2023 are Upper Hunter daily observations for Scone.Dashed lines are 5th and 95th percentiles (bottom and top red), 10th and 90th percentiles (bottom and top orange), 15th and 85th percentiles (bottom and top light blue), 20th and 80th percentiles (bottom and top brown), 25th and 75th percentiles (bottom and top dark blue), and the horizontal solid black line is the median.Open red square shows May-October 2022 is Decile 1 (very much below average), and May-October 2023 is Decile 10 (and highest recorded), (c) New South Wales DPI combined drought indicator, 31 December 2023, shows the Upper Hunter experiencing drought with some areas in intense drought.

Figure 4 .
Figure 4. Box and whisker plots of mean precipitation.Box-whisker plots of mean precipitation (left panel) for the consecutive periods 1963-1992 and 1993-2022 (left panel) and variance (right panel), for the Upper Hunter.The inter-quartile range, from the 25th and 75th percentiles, is represented by the horizontal lines of the boxes, with the medians (50th percentiles) are shown as thick, horizontal black lines.The lower horizontal lines are the 10th percentiles and the upper horizontal lines near the outliers are 90th percentiles.The circles above and below the 10th and 90th percentile lines are the outliers.

Table 2 .
(a) Attribute/Predictor selection for Upper Hunter precipitation data 1963-2022 using the eight different ML techniques.The percentage of rolling windows times series selection of the attributes is shown for each of the eight methods (columns 2-9).The mean and standard deviation for rows and columns, are shown.The six most dominant attributes and hence likely predictors are highlighted (yellow).(b) Attribute/Predictor selection for Upper Hunter precipitation data 1963-1992 using the eight different ML techniques.The percentage of rolling windows time series selection of the predictors ≥45% is shown for each of the eight methods (columns 2-9).The mean and standard deviation are shown for all rows and columns.The six most dominant attributes are highlighted (yellow).(c) Attribute/Predictor selection for Upper Hunter precipitation data 1993-2022 using the eight different ML techniques.The percentage of rolling windows time series selection of predictors ≥45% is shown for each of the eight methods (columns 2-9).The mean and standard deviation for rows and columns is shown.The seven most dominant predictors are highlighted (yellow).

Table 1 .
The climate drivers listed by acronym are identified in Table1below.Tasman Sea sea-surface temperature anomalies (TSSSTA) were obtained from the Australian Bureau of Meteorology online climate data archives: http://www.bom.gov.au/cgi-bin/climate/Identification of climate drivers with their acronyms and indices used.

Table 2 .
Cont. one will be best.In other climate applications of ML, SVM has performed well. which