Drought Prediction for Areas with Sparse Monitoring Networks : A Case Study for Fiji

Hybrid drought prediction models were developed for areas with limited monitoring gauges using the APEC Climate Center Multi-Model Ensemble seasonal climate forecast and machine learning models of Extra-Trees and Adaboost. The models provide spatially distributed detailed drought prediction data of the 6-month Standardized Precipitation Index for the case study area, Fiji. In order to overcome the limitation of a sparse monitoring network, both in-situ data and bias-corrected dynamic downscaling of historical climate data from the Weather Research Forecasting (WRF) model were used as reference data. Performance measures of the mean absolute error as well as classification accuracy were used. The WRF outputs reflect the topography of the area. Hybrid models showed better performance than simply bias corrected forecasts in most cases. Especially, the model based on Extra-Trees trained using the WRF model outputs performed the best in most cases.


Introduction
Islands in the South Pacific are vulnerable to climate change [1].The climate in the South Pacific has become drier by 15% and warmer by 0.8 • C, compared to the earlier 20th century [2].Fiji, one of the key Pacific Island countries, experiences easterly trade winds on most calendar days.The easterly trade winds or the northeasterly monsoon, when lifted by high mountains, causes moisture condensation and produces heavy rainfall on the windward eastern side of Fiji.The subsidence of the relatively dry air produces less rainfall on the leeward western side.
From a large-scale viewpoint, the El Nino Southern Oscillation (ENSO) is the main cause of climate variability over this region at interannual timescales.La Nina events dominated the interannual sea surface temperature (SST) anomaly (SSTA) over the central Equatorial Pacific during 1950 and 1975; after that time, El Nino events became more frequent [3].The Pacific Decadal Oscillation (PDO) dominates the climate variability at decadal timescales [4].PDO was mostly positive prior to 1998 and then shifted to a strong negative phase [5].Positive PDO is characterized by the similar SSTA of El Nino over the Equatorial Pacific, and thus shifts the weather systems northeastward, but on a decadal timescale.The South Pacific Convergence Zone (SPCZ) is a reverse-oriented monsoon trough with strong low-level convergence and a rainfall band that extends from the Warm Pool southeastward to French Polynesia [6,7].The interferential impact of ENSO and PDO on the SPCZ is complex [8,9].El Nino events weaken the strength of the Walker Circulation and shift the dominant weather systems over the Equatorial Pacific toward areas in the northeast such as the SPCZ.When El Nino takes place during the positive PDO, the SPCZ moves northeast towards the equator, and its intensity becomes stronger [8].The large-scale convection departure decreases precipitation over Fiji and leads to droughts [10].
Fiji has observed more frequent dry conditions since the 1950's compared to previous decades in the western and northern areas based on analysis performed using the Standardized Precipitation Index (SPI).Analysis of observed monthly rainfall for Fiji over the period  showed downward trends at a 99% confidence level with decreases in rainfall of approximately 13-47 mm per year [11].Although no significant long-term trends were observed in annual rainfall [12], there were more frequent dry seasons during the last 50 years compared to the first 50 years when the nearly 100 years of data since 1900 were examined [13].The local temperature also increased due to the effects of climate change [14].The most impacted stations were located in western and northern Fiji, where deficiency in rainfall from 1969-1988 caused an increase in moderate and severe droughts [11].Risbey et al. [15] projected an increase in rainfall of approximately 3.3% by 2025 and 9.7% by 2100 using a global climate model (GCM).Feresi et al. [16] and Agrawala et al. [17] did not project a definitive change in rainfall.IPCC [18] projected that Fiji will experience an intensified seasonal cycle, i.e., a rainfall decrease in the dry season and a rainfall increase in the wet season.The shift towards extended periods of dry spells causes loss of soil fertility, which could impact negatively on agriculture [1].
Since 1940, severe droughts have occurred in 1942, 1958, 1969, 1978, 1983, 1987, 1992, 1997-1998, 2003, and 2010 [16].Severe droughts can cause serious socio-economic loss as well as physical damages as drought conditions persist.The ENSO event of 1997-1998 caused a severe drought with damages of up to Fiji $100 million.Rainfall failure occurred across two successive dry seasons, and more significantly during the intervening wet season when precipitation is normally reliable [16].Since many rural communities are reliant on rainwater, streams, and shallow wells for domestic use, watering crop gardens, and livestock, these communities are especially vulnerable to periods of drought when surface water resources are at a minimum [19].Schools and businesses were forced to close and caused disruption to residential areas.Such impacts made extreme difficulties for Fiji since the resources of an island country are limited.External aid and governmental assistance were required to ensure supply of sustenance and facilitate recovery in the worst-hit parts of Fiji, which included the western and northern divisions and outer islands.
Drought conditions in Fiji are currently monitored using the 3-, 6-, and 12-month SPI calculated for weather stations with long historical data [20].The monitoring network over Fiji with long data is quite sparse though, resulting in considerable uncertainty in the estimates of extreme wet and dry events.Evidence shows that estimation of the historical trends has a large noise-to-signal ratio over regions with sparse data networks [21].Furthermore, most Fiji weather stations with long data are located along the coastline, so the sparse network cannot capture small-scale convective precipitation over land and precipitation from orographic lifting at mountains.Rainfall variability in the high mountains is greater than the variability in cities.
The limited variables and inconsistency in duration of satellite observation introduces difficulties and uncertainties in methods and analysis.For example, the Climate Prediction Center Morphing Technique (CMORPH) data is only available from 1998 onward.Due to the limited number or variables being observed, it is difficult to prepare for droughts because the response of rainfall distribution to large-scale dynamics is unclear.In addition, unlike other types of disasters, the onset and termination of droughts is not always clear.The increase in uncertainty of climate variability makes the reduction of drought impacts even more difficult.
Drought outlook of Fiji is also provided based on SPI: SPI predictions for weather stations are based on the statistically downscaled seasonal forecast data from the Seasonal Climate Outlooks for Pacific Island Countries developed by the Bureau of Meteorology of Australia.If spatially distributed drought prediction is available, possibly reflecting the orographic effect of the main island, it would be helpful to prevent and minimize the adverse impacts of droughts in Fiji.Drought prediction data only available for weather stations or obtained based on low-resolution bias-corrected seasonal forecast data are not sufficient for effective decision making.
This study aims to develop a drought prediction model that can be used for areas with sparse monitoring networks.Fiji is a case study area.By providing spatially detailed drought prediction data, Water 2018, 10, 788 3 of 19 vulnerability to droughts may be reduced while resiliency may be increased.Multi-Model Ensemble seasonal climate forecast data from APEC Climate Center (APCC MME) are used to provide up to 6 months-lead climate forecasting.Machine learning models are used to provide spatially distributed drought information for ungauged areas.In order to overcome the limitation of sparse monitoring networks, dynamically downscaled historical climate data from the Weather Research and Forecasting (WRF) model are used to train machine learning models instead of in-situ data as reference data.
This study ultimately targets national, provincial, and regional officials whose main duties include water resources and agricultural management.The final beneficiaries of the output are residents of the area; water users and farmers for whom decision-making can be helped by drought prediction information with finer spatial resolution.

Study Area
Fiji has a total area of about 194,000 km 2 of which approximately 10% is land.Fiji consists of 332 islands.The two largest islands are Viti Levu and Vanua Levu, which account for about three-quarters of the total land area of Fiji [22].Figure 1 shows the topography of Fiji's main islands.The largest island, Viti Levu, which has an area of 10,388 km 2 , is covered with thick tropical forest.The island has a considerable area higher than 500 m in elevation with the peak of Mount Tomanivi at 1324 m above sea level.Viti Levu hosts the capital city of Suva, which contains about three-quarters of the population.Other important towns include Nadi, where the international airport is located, and Lautoka.This study ultimately targets national, provincial, and regional officials whose main duties include water resources and agricultural management.The final beneficiaries of the output are residents of the area; water users and farmers for whom decision-making can be helped by drought prediction information with finer spatial resolution.

Study Area
Fiji has a total area of about 194,000 km 2 of which approximately 10% is land.Fiji consists of 332 islands.The two largest islands are Viti Levu and Vanua Levu, which account for about threequarters of the total land area of Fiji [22].Figure 1 shows the topography of Fiji's main islands.The largest island, Viti Levu, which has an area of 10,388 km 2 , is covered with thick tropical forest.The island has a considerable area higher than 500 m in elevation with the peak of Mount Tomanivi at 1324 m above sea level.Viti Levu hosts the capital city of Suva, which contains about three-quarters of the population.Other important towns include Nadi, where the international airport is located, and Lautoka.Fiji has a tropical marine climate and is warm year-round with minimal extremes.The warm season lasts from November to April and the cool season lasts from May to October.Temperatures in the cool season average 22 °C.Winds are moderate, though cyclones occur about once a year (10-12 times per decade).Viti Levu is a mountainous volcanic island with a wet-dry tropical climate.The southeast side of the island faces the predominant trade winds and therefore receives more precipitation than the northwest side, which is rain-shadowed by interior highlands.The volcanic mountains force orographic lifting of the saturated air, which can produce extremely heavy rainfall on the windward side of the mountain.Rainfall on the leeward side is much lighter due to the subsidence of the dry air, which largely influences agriculture in those areas.In the dry season, the uneven distribution of rainfall can cause a prolonged lack of moisture on the leeward side.The leeward side only receives 20% of the annual total rainfall in the dry season, compared to 33% received on the windward side [23].
Sugar export is an important source of foreign exchange for Fiji, as sugar cane processing makes up one-third of industrial activity.Coconut, ginger, and copra are also significant industries.These Fiji has a tropical marine climate and is warm year-round with minimal extremes.The warm season lasts from November to April and the cool season lasts from May to October.Temperatures in the cool season average 22 • C. Winds are moderate, though cyclones occur about once a year (10-12 times per decade).Viti Levu is a mountainous volcanic island with a wet-dry tropical climate.The southeast side of the island faces the predominant trade winds and therefore receives more precipitation than the northwest side, which is rain-shadowed by interior highlands.The volcanic mountains force orographic lifting of the saturated air, which can produce extremely heavy rainfall on the windward side of the mountain.Rainfall on the leeward side is much lighter due to the subsidence of the dry air, which largely influences agriculture in those areas.In the dry season, the uneven distribution of rainfall can cause a prolonged lack of moisture on the leeward side.The leeward side only receives 20% of the annual total rainfall in the dry season, compared to 33% received on the windward side [23].
Sugar export is an important source of foreign exchange for Fiji, as sugar cane processing makes up one-third of industrial activity.Coconut, ginger, and copra are also significant industries.These agricultural products are highly influenced by climate extremes; the sugar industry was damaged by drought in 1998.

In-Situ Data
Figure 2a shows the location of rainfall gauges of the two main islands used in this study (Table 1).In-situ rain-gauge hourly precipitation data for 1981-2010 were obtained and daily data for the period were used for the bias-correction of the WRF model.Monthly data were also used for calculating drought index values for the training of machine learning models.Some data were missing during a short period of time from gauges at Udu Point and Nabouwalu.agricultural products are highly influenced by climate extremes; the sugar industry was damaged by drought in 1998.

In-Situ Data
Figure 2a shows the location of rainfall gauges of the two main islands used in this study (Table 1).In-situ rain-gauge hourly precipitation data for 1981-2010 were obtained and daily data for the period were used for the bias-correction of the WRF model.Monthly data were also used for calculating drought index values for the training of machine learning models.Some data were missing during a short period of time from gauges at Udu Point and Nabouwalu.

WRF Model Outputs
Dynamic downscaling of historical climate through the WRF model forced by the European Centre for Medium-Range Weather Forecasts Reanalysis (ERA)-Interim reanalysis dataset in a double nested framework with spectral nudging in the parent domain was used in this study [24].Many validations show that the WRF outputs are pretty reliable.Precipitation data with 8 km spatial resolution for 1981-2010 were used in this study.Centroids of the 227 grid cells are shown in Figure 2b.

SPI
The SPI is widely used to characterize meteorological drought on a range of timescales [25,26] (Table 2).It quantifies observed precipitation as a standardized departure from a selected probability distribution function that models the raw precipitation data.The raw precipitation data are fitted to a gamma distribution, for example, and then transformed to a normal distribution.The SPI values can be interpreted as the number of standard deviations by which the observed anomaly deviates from the long-term mean.The SPI can be created for differing periods of 1 to 36 months, using monthly input data.The SPI can be compared across regions with markedly different climates.In this study, 6-month SPI (SPI6) was used to examine the performance of the drought prediction model developed, which is based on APCC MME up to 6 months-lead forecast data.SPI6 is also used by the Fiji Meteorological Service (FMS) to examine agricultural (soil moisture) and hydrological droughts because the 6-month droughts affect deeper rooted plants and medium-sized water bodies [27].

PERSIANN-CDR
The drought prediction model developed in this study relies on remote sensing based precipitation data in order to compensate for the low spatial coverage of weather stations.To secure precipitation data covering a large enough area, the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN)-Climate Data Record (CDR) was used [29].PERSIANN-CDR data were created based on infrared sensor data for the period with no microwave sensor data.The data cover 60 • S-60 • N, 180 • W-180 • E, with a spatial resolution of 0.25 • × 0.25 • .Daily data were obtained and converted to monthly total precipitation data.

TRMM
The tropical rainfall measuring mission (TRMM) was developed jointly by the United States (US) NASA and the Japan Aerospace Exploration Agency.The TRMM 3B42 product with 3-h data collection intervals was obtained from the NASA Goddard Earth Sciences Data and Information Service Center and converted to monthly total precipitation data.The TRMM data cover 50 • S-50 • N, 180 • W-180 • E, and have a spatial resolution of 0.25 • × 0.25 • .The data are in equirectangular (or geographic) projection with WGS84 datum.

GPM
The Integrated Multi-Satellite Retrievals for the Global Precipitation Measurement Mission (GPM) data were used as remote sensing based precipitation data from April 2014 onward.The data were obtained from the Precipitation Measurement Missions of NASA, and cover 90 • S-90 • N, 180 • W-180 • E, and have a spatial resolution of 0.1 • × 0.1 • .The data are also in equirectangular (or geographic) projection with WGS84 datum.The data were converted to monthly total precipitation data.Since the time scale of the developed drought prediction model is monthly, the 8-day data were converted into monthly data using the number of days of the 8-day period for each month as weights.Mean LST (LST_MEAN) was also calculated from daytime LST (LST_DAY) and nighttime LST (LST_NIGHT).

MODIS Vegetation Indices
Vegetation indices of the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI) data were obtained from the Level-3 data of MODIS onboard Aqua, MYD13A3 Vegetation Indices Monthly L3 Global 1 km, from EARTHDATA of NASA from July 2002 to December 2016.Temporal and spatial resolutions are monthly and approximately 1 km × 1 km, respectively.The data are also projected in Sinusoidal projection.
The NDVI can be calculated using the changes in reflectance in red and near infrared (NIR) channels (Equation (1)) and has been widely used as an indicator of vegetation vigor [30].The EVI uses the blue band in addition to red and NIR bands, minimizing the influence of the background effect of soil, snow, and water (Equation ( 2)).The EVI retains sensitivity to vegetation vitality, which is often shown saturated in the NDVI.The blue band helps to remove the atmospheric effect caused by air and clouds.
where NIR, RED, and BLUE are reflectance values of NIR, RED, and BLUE channels, respectively; L is a parameter for reducing the background effect of canopy; C1 and C2 are weighting parameters to correct the influence of the aerosol effect of the red band when the blue and red bands are used together [31].
Water 2018, 10, 788 7 of 19 3.5.6.Elevation Data Global 30 Arc-Second Elevation (GTOPO30) data with 1 km × 1 km spatial resolution were obtained from the US Geological Survey and used for the study area.
3.6.Large-Scale Climate Index The SPCZ, a reverse-oriented monsoon trough, is a band of low-level convergence, cloudiness, and precipitation extending from the Western Pacific Warm Pool at the maritime continent southeastward toward French Polynesia and as far as the Cook Islands (160 • W, 20 • S).The SPCZ occurs where the southeast trade winds from transitory anticyclones to the south meet with the semi-permanent easterly flow from the eastern South Pacific anticyclone.
To study the SPCZ and its impacts on weather and climate over the South Pacific islands, previous studies suggested several SPCZ indices [8,[32][33][34][35].Here, we adopted the SPCZ strength index from Kidwell et al. [34] to quantify the impact of the SPCZ on rainfall over Fiji.The SPCZ region was encompassed in 0 • -30 • S, 130 • E-110 • W. The strength of the SPCZ is defined by the surface wind convergence in this region derived from the ERA-Interim.Divergence was calculated with Equation (3): where u and v are the zonal and meridional components of the surface winds.Positive D corresponds to surface divergence, and a negative value corresponds to surface convergence.The SPCZ strength is defined by the monthly mean area-weighted average of convergence within the SPCZ region: where a(x,y) is the area of a grid cell centered at location (x,y), and the spatial summation ∑ is performed over grid cells with D(x,y) < 0 within the SPCZ region.The anomaly of the SPCZ strength is defined as SPCZ index.

MEI
The ENSO is an irregularly periodic variation in winds and SST over the tropical eastern Pacific Ocean, affecting much of the tropics and subtropics.The warming phase is known as El Nino and the cooling phase as La Nina.Southern Oscillation is the accompanying atmospheric component, coupled with the sea temperature change; El Nino is accompanied with high air surface pressure while La Nina with low in the tropical western Pacific.The two periods last several months each (typically occurring every few years) and their effects vary in intensity.The Multivariate ENSO Index (MEI) from the National Oceanic and Atmospheric Administration (NOAA) were used as a measure of ENSO.

Drought Modeling
Mishra and Singh [36] reviewed a variety of drought modeling methods and described the components of drought modeling as hydro-meteorological variables, drought indices, climate indices, methodologies, and outputs.Among hydro-meteorological variables, rainfall is the most important variable for meteorological drought forecasting, soil moisture and crop yield are the key variables for agricultural drought forecasting, and stream flow and reservoir level are the most important variables for hydrological drought forecasting.Sometimes many variables are combined to obtain drought characteristics such as drought severity, duration, and spatial extent.Large-scale climate indices such as ENSO or the Arctic Oscillation (AO) index are used to forecast longer droughts.There can be many methods used, including regression models, time-series models, probability models, neural networks models, and statistical-dynamic models [36][37][38][39][40][41].
Recently, drought prediction methods using machine learning have been developed [42,43].Rules required by expert systems can be developed either by human experts or derived by machines based on data provided by human beings; this training process is called machine learning [44].Tadesse et al. [42] developed a rule-based regression tree model forecasting drought conditions and crop yield based on remotely sensed vegetation conditions, SPI, land use, available water capacity of soil, and irrigation areas.Rhee and Im [43] tested decision tree models, random forest models, and extra-trees models to forecast drought indices of the SPI and the Standardized Precipitation-Evapotranspiration Index in South Korea.

Machine Learning Model Design
As an indicator representing true drought conditions, the target variable was set as SPI6_OBS, which is reference SPI6 calculated either using in-situ precipitation data from four rainfall gauges or using the WRF model outputs from 227 pixel locations (Figure 3).
If we were to monitor current drought conditions, we may rely on SPI6_RS, which is SPI6 calculated from remote sensing based rainfall, since reference SPI6 is only available for the past or for some limited locations.However, there are usually gaps between SPI6_RS and SPI6_OBS.In order to explain or reduce the discrepancy, drought-affected input variables of LST_DAY, LST_NIGHT, LST_MEAN, NDVI, and EVI can be included to the model (Figure 3).Elevation (ELEV) can also be included to consider the topographical effect on rainfall, complementing the coarse spatial resolution of remotely sensed rainfall data (Figure 3).
Since the purpose of the model is drought prediction, long-range climate forecasting can be used to estimate the effect of synoptic and large-scale atmospheric circulation.While SPI6_RS was used for training machine learning models assuming perfect climate forecast, SPI6_FCST was used for test; SPI6_FCST is SPI6 calculated from bias-corrected precipitation data combining the percent increment of the rainfall anomaly of APCC MME and the climatology of remote sensing based rainfall [45] (Figure 3).A 6-month period of accumulated rainfall was divided into two periods according to the lead-time of the forecast; months with observed rainfall and months with forecasted rainfall.Remote sensing-based precipitation data were used as the observed rainfall, and bias-corrected precipitation forecast data were used as the forecasted rainfall.Parameters for the gamma probability distribution functions were pre-fitted based on remote sensing-based precipitation data and used for SPI6_FCST calculations.
Month of the data (MONTH) was also included for temporal information, and large-scale circulation indices of MEI and SPCZ strength (SPCZ) were also included (Figure 3).Time points of data vary for 1 to 6-month lead drought prediction; initial points of data were used for remote sensing data and large-scale indices (for example, January 2017 values were used for 3-month lead predictions for April 2017), while target points of data were used for MONTH, SPI6_RS (training), and SPI6_FCST (test).
As the machine learning models, the Extra-Trees (ERT hereafter) [46] and the Adaboost [47] models were used in this study.The implementation was done using the Python library scikit-learn 0.18.1.ERT is known to produce stable results against outliers and noise in training data, and had excellent performance in drought forecasting [43].Adaboost is a weak learner; it enables the model to simulate minor characteristics of training data by assigning higher weights to the subsets that are less reflected during its iteration processes.
The training of the models can be done either using in-situ data or using the WRF model outputs for SPI6_OBS.The models trained using SPI6_OBS based on in-situ data may not be appropriate to be used for other areas because data from only four weather stations are used and the models are trained specific to the locations.Two cases were compared; in one case, the models were trained using 80% of the WRF model outputs and evaluated using 20% of the data.In the other case, the models were trained using all in-situ data and evaluated using the same test dataset of the previous case.Numbers of data samples are shown in Table 3.Although a three-tier approach of training, validation, and testing is often used to optimize parameters for some artificial intelligence models, we used a two-tier approach of training and testing with the fixed number of trees for ERT and Adaboost of 100 and the maximum depth of tree growth of 15 levels.Various numbers of trees and levels of maximum depth of tree growth had been tested using cross-validation of training data; the number of trees larger than 100 did not produce much difference.Although larger levels of maximum depth of tree growth tend to produce better results, the retrieval of the trained model with larger than 15 levels of maximum depth of tree growth including full development was very demanding of computational resources.
were trained using all in-situ data and evaluated using the same test dataset of the previous case.Numbers of data samples are shown in Table 3.Although a three-tier approach of training, validation, and testing is often used to optimize parameters for some artificial intelligence models, we used a two-tier approach of training and testing with the fixed number of trees for ERT and Adaboost of 100 and the maximum depth of tree growth of 15 levels.Various numbers of trees and levels of maximum depth of tree growth had been tested using cross-validation of training data; the number of trees larger than 100 did not produce much difference.Although larger levels of maximum depth of tree growth tend to produce better results, the retrieval of the trained model with larger than 15 levels of maximum depth of tree growth including full development was very demanding of computational resources.

Data Pre-Processing
Remote sensing-based variables of LST_DAY, LST_NIGHT, LST_MEAN, NDVI, EVI, and ELEV were all subset to the extent of 176.5 • E-178 • W, 21.5 • S-12.0 • S and then resampled to have 0.01 • × 0.01 • spatial resolution.Since many machine learning models tend to be sensitive to the magnitudes of input variables, these data were scaled using maximum and minimum values of each month for each pixel [48].
Since SPI is inherently Gaussian, the numbers of input data for each drought category of Table 2 are not even.Because some machine learning models are known to be sensitive to the distribution of samples, the following process was performed when preparing input data: additional input data were created with added noise by multiplying the standard deviation of the variable for the location and month with a random number between 0 and 1, so that all drought categories have the same sample numbers during training.
The thirty-year period from 1981 to 2010 was used for calculating SPI.Due to the short history of MODIS, the input data from July 2002 to 2016 were used for the machine learning models.

Performance Measures
Information on drought index values or corresponding drought categories indicating the severity of drought can be more useful to users than just having binary information of drought or non-drought.Performance measures used in this study include: Total Accuracy, which is the producer's accuracy, and mean absolute error (MAE) for all drought categories in Table 2 (total MAE hereafter).Although there may not be enough serious drought events during the short study period from July 2002 to 2016, performance measures only for the three drier categories of Extreme Drought, Severe Drought, and Moderate Drought were also used: Drought Accuracy, which is a modified producer's accuracy in Rhee and Im [43] focusing on the three drier categories, and MAE for the three drier categories (Drought MAE hereafter).

Total or Drought Accuracy
Total or Drought MAE = ∑ SPI6 obs − SPI6 pred Total Number of Samples (6) where N is the number of samples for each category, and C is the number of correctly categorized samples for each category.All categories are considered for Total Accuracy and Total MAE, while the three drier categories are considered for Drought Accuracy and Drought MAE.

Training of the Models
The machine learning models of ERT and Adaboost were trained using 80% of the WRF model outputs (ERT_WRF and Adaboost_WRF hereafter) or using 100% of the in-situ data (ERT_INSITU and Adaboost_INSITU hereafter).The performance of SPI6 predictions from simply bias-corrected precipitation forecast (FCST_ONLY hereafter) based on the same training dataset of the WRF model outputs was compared to the performance of ERT and Adaboost (Figure 4).Differences in MAE between methods were also statistically tested using two-sided or one-sided Welch's t-test for both Total MAE and Drought MAE.
Both ERT_WRF and Adaboost_WRF outperformed FCST_ONLY in most cases, and Total MAE and Drought MAE values of ERT_WRF were especially small (Figure 4a,b).The differences were all statistically significant based on two-tailed p-values with a confidence level of 0.01 (data not shown).Only Adaboost_WRF with 1-month lead predictions showed larger Drought MAE than FCST_ONLY based on one-sided t-test (Figure 4b).ERT_WRF outperformed FCST_ONLY and Adaboost_WRF based on one-sided t-test (data not shown).

Test of the Models
The performance of SPI6 predictions of the machine learning models (ERT_WRF, Adaboost_WRF, ERT_INSITU and Adaboost_INSITU) as well as FCST_ONLY was evaluated based on the remaining 20% of the WRF model outputs (Figure 6).Differences in MAE between methods were also statistically tested.
ERT_WRF showed the smallest Total MAE, and the differences between ERT_WRF and all other methods were statistically significant based on one-sided t-test with the confidence interval of 0.01 (Figure 6a; p-values are not shown).Adaboost_WRF also produced smaller Total MAE compared to FCST_ONLY for 1-to 4-month lead predictions, while the differences were not statistically significant for 5-and 6-month lead predictions (two-tailed p-values are 0.031 and 0.026, respectively).Even ERT_INSITU and Adaboost_INSITU produced significantly smaller Total MAE than FCST_ONLY for 1-to 3-month lead predictions (Figure 6a).Cases that failed to reject the null hypothesis of equal mean error with FCST_ONLY are shaded (Figure 6a).
In contrast to training where Drought MAE of FCST_ONLY was mostly the largest (Figure 4c), Drought MAE of FCST_ONLY was mostly the smallest for all lead times with the test dataset (Figure 6c).Cases that failed to reject the null hypothesis of equal or larger mean error with FCST_ONLY are shaded based on two-tailed and one-tailed p-values, meaning only these cases produce comparable Drought MAE to FCST_ONLY (Figure 6c; data not shown).The one-sided t-test with the null hypothesis of larger error of FCST_ONLY in all other cases was rejected, meaning that they produced larger Drought MAE in most cases (Figure 6c).There were no obvious differences observed in Total Accuracy between the methods; Total Accuracy of ERT_WRF was the highest for all lead times (Figure 6b).FCST_ONLY produced higher Drought Accuracy for 1-month lead SPI6 predictions, while ERT_WRF performed the best for longerterm predictions (Figure 6d).The selection of training data (WRF model outputs versus in-situ data), the selection of a prediction model (FCST_ONLY versus machine learning models of ERT and Adaboost), and the lead time had the greatest effect on Drought Accuracy (Figure 6d).

Test of the Models
The performance of SPI6 predictions of the machine learning models (ERT_WRF, Adaboost_WRF, ERT_INSITU and Adaboost_INSITU) as well as FCST_ONLY was evaluated based on the remaining 20% of the WRF model outputs (Figure 6).Differences in MAE between methods were also statistically tested.
ERT_WRF showed the smallest Total MAE, and the differences between ERT_WRF and all other methods were statistically significant based on one-sided t-test with the confidence interval of 0.01 (Figure 6a; p-values are not shown).Adaboost_WRF also produced smaller Total MAE compared to FCST_ONLY for 1-to 4-month lead predictions, while the differences were not statistically significant for 5-and 6-month lead predictions (two-tailed p-values are 0.031 and 0.026, respectively).Even ERT_INSITU and Adaboost_INSITU produced significantly smaller Total MAE than FCST_ONLY for 1-to 3-month lead predictions (Figure 6a).Cases that failed to reject the null hypothesis of equal mean error with FCST_ONLY are shaded (Figure 6a).
In contrast to training where Drought MAE of FCST_ONLY was mostly the largest (Figure 4c), Drought MAE of FCST_ONLY was mostly the smallest for all lead times with the test dataset (Figure 6c).Cases that failed to reject the null hypothesis of equal or larger mean error with FCST_ONLY are shaded based on two-tailed and one-tailed p-values, meaning only these cases produce comparable Drought MAE to FCST_ONLY (Figure 6c; data not shown).The one-sided t-test with the null hypothesis of larger error of FCST_ONLY in all other cases was rejected, meaning that they produced larger Drought MAE in most cases (Figure 6c).There were no obvious differences observed in Total Accuracy between the methods; Total Accuracy of ERT_WRF was the highest for all lead times (Figure 6b).FCST_ONLY produced higher Drought Accuracy for 1-month lead SPI6 predictions, while ERT_WRF performed the best for longer-term predictions (Figure 6d).The selection of training data (WRF model outputs versus in-situ data), the selection of a prediction model (FCST_ONLY versus machine learning models of ERT and Adaboost), and the lead time had the greatest effect on Drought Accuracy (Figure 6d).
Scatter plots of reference SPI6 vs. 1-month as well as 3-month lead SPI6 predictions for testing are shown in Figures 7 and 8, respectively.
Water 2018, 10, x FOR PEER REVIEW 13 of 19 Scatter plots of reference SPI6 vs. 1-month as well as 3-month lead SPI6 predictions for testing are shown in Figures 7 and 8, respectively.Scatter plots of reference SPI6 vs. 1-month as well as 3-month lead SPI6 predictions for testing are shown in Figures 7 and 8, respectively.

Spatial Distribution Maps of SPI6 Predictions
Spatially distributed maps of 1-to 6-month lead SPI6 predictions based on FCST_ONLY and ERT_WRF were created.Some examples are shown in Figure 9; in order to provide the WRF-based SPI6 map used for training machine learning models as well as in-situ SPI6 map with available data from all four weather stations, 21 months with all data available were identified.Although no extreme drought events were observed in the 21 months, Nadi (91680) station experienced severe droughts in March, June, July, and October 2010.

Spatial Distribution Maps of SPI6 Predictions
Spatially distributed maps of 1-to 6-month lead SPI6 predictions based on FCST_ONLY and ERT_WRF were created.Some examples are shown in Figure 9; in order to provide the WRF-based SPI6 map used for training machine learning models as well as in-situ SPI6 map with available data from all four weather stations, 21 months with all data available were identified.Although no extreme drought events were observed in the 21 months, Nadi (91680) station experienced severe droughts in March, June, July, and October 2010.

Spatial Distribution Maps of SPI6 Predictions
Spatially distributed maps of 1-to 6-month lead SPI6 predictions based on FCST_ONLY and ERT_WRF were created.Some examples are shown in Figure 9; in order to provide the WRF-based SPI6 map used for training machine learning models as well as in-situ SPI6 map with available data from all four weather stations, 21 months with all data available were identified.Although no extreme drought events were observed in the 21 months, Nadi (91680) station experienced severe droughts in March, June, July, and October 2010.

Relative Importance of Input Variables to Machine Learning Models
Python modules for machine learning models provide information on the relative importance of input variables.The importance of the most important variable is set to 100% and relative importance scores of other input variables are determined.In all cases, the most important variable was SPI6_RS in this study, and only the scores of other input variables are shown in Figure 10.
When in-situ precipitation data were used for reference data, the relative importance of all other input variables was quite low; the score of the second important variable MEI only ranges between 4% and 8% for ERT_INSITU (Figure 10c).For Adaboost_INSITU, the scores of input variables vary with lead time, but all were below 20% (Figure 10d).The importance of temporal (MONTH) and topographical (ELEV) information as well as large-scale climate indices (SPCZ, MEI) were more obvious when the WRF model outputs were used for reference data (Figure 10a,b).For ERT_WRF, the scores of MONTH, MEI, and SPCZ were higher than other input variables, mostly over 20% (Figure 10a).The scores of those three variables as well as ELEV were higher for Adaboost_WRF; the score for MONTH even reached about 55% (Figure 10b).
Differences in the relative importance of the input variables between the sources of reference data indicate that temporal characteristics of drought occurrences and the effect of ENSO, SPCZ strength, as well as topography of the region could not be adequately applied to the models when insitu data were used for reference data, because in-situ data from only few stations are available.The use of the WRF model output precipitation data, on the other hand, enabled the use of diverse information from those variables.

Relative Importance of Input Variables to Machine Learning Models
Python modules for machine learning models provide information on the relative importance of input variables.The importance of the most important variable is set to 100% and relative importance scores of other input variables are determined.In all cases, the most important variable was SPI6_RS in this study, and only the scores of other input variables are shown in Figure 10.
When in-situ precipitation data were used for reference data, the relative importance of all other input variables was quite low; the score of the second important variable MEI only ranges between 4% and 8% for ERT_INSITU (Figure 10c).For Adaboost_INSITU, the scores of input variables vary with lead time, but all were below 20% (Figure 10d).The importance of temporal (MONTH) and topographical (ELEV) information as well as large-scale climate indices (SPCZ, MEI) were more obvious when the WRF model outputs were used for reference data (Figure 10a,b).For ERT_WRF, the scores of MONTH, MEI, and SPCZ were higher than other input variables, mostly over 20% (Figure 10a).The scores of those three variables as well as ELEV were higher for Adaboost_WRF; the score for MONTH even reached about 55% (Figure 10b).
Differences in the relative importance of the input variables between the sources of reference data indicate that temporal characteristics of drought occurrences and the effect of ENSO, SPCZ strength, as well as topography of the region could not be adequately applied to the models when in-situ data were used for reference data, because in-situ data from only few stations are available.The use of the WRF model output precipitation data, on the other hand, enabled the use of diverse information from those variables.

Conclusions
We developed hybrid drought prediction models using APCC MME seasonal climate forecasts and machine learning models and examined their performance for the case study area of Fiji.The purpose of the models is to provide spatially distributed detailed drought prediction data of SPI6 for the area.The APCC MME provides up to 6-month lead precipitation forecast data.Remote sensing data were used to bias-correct the forecast data as well as to train machine learning models; machine learning models of ERT and Adaboost were used to provide spatially distributed drought information for ungauged areas.In order to overcome the limitation of sparse monitoring network, dynamic downscaling of historical climate with the WRF model was used to produce reference data.
When compared to the performance of the hybrid models trained based on different reference data, the models trained using the WRF model outputs performed better than the models trained using in-situ data: ERT_WRF outperformed ERT_INSITU in all cases, and Adaboost_WRF outperformed Adaboost_INSITU except for Drought MAE and Drought Accuracy of 1-month lead predictions, Total MAE and Total Accuracy of 2-month lead predictions, and Total Accuracy of 3month lead predictions.The superiority of the models trained based on the WRF model outputs indicates that the spatial extent of the training data is important because in-situ data are from only four weather stations.The added value caused by the topography is clear, especially in the convergence/divergence field over the islands; this crucially impacted inland and coastal precipitation and caused greater detail in precipitation to be found in the WRF model outputs [24].
The use of the ERT_WRF model produced better results compared to Adaboost_WRF in terms of Total MAE, Total Accuracy, and Drought Accuracy for all lead times, as well as in terms of Drought MAE of 1-month lead predictions.For other lead times, no statistical difference between ERT_WRF and Adaboost_WRF were found (2-to 4-month lead predictions) or ERT_WRF showed larger error than Adaboost_WRF (5-to 6-month lead predictions) in terms of Drought MAE.It shows that the choice of the machine learning model matters; the use of simulated input data with added noise to attain the same numbers of samples between drought categories may have improved the performance of ERT and surpassed the advantage of Adaboost, supporting weak learners.

Conclusions
We developed hybrid drought prediction models using APCC MME seasonal climate forecasts and machine learning models and examined their performance for the case study area of Fiji.The purpose of the models is to provide spatially distributed detailed drought prediction data of SPI6 for the area.The APCC MME provides up to 6-month lead precipitation forecast data.Remote sensing data were used to bias-correct the forecast data as well as to train machine learning models; machine learning models of ERT and Adaboost were used to provide spatially distributed drought information for ungauged areas.In order to overcome the limitation of sparse monitoring network, dynamic downscaling of historical climate with the WRF model was used to produce reference data.
When compared to the performance of the hybrid models trained based on different reference data, the models trained using the WRF model outputs performed better than the models trained using in-situ data: ERT_WRF outperformed ERT_INSITU in all cases, and Adaboost_WRF outperformed Adaboost_INSITU except for Drought MAE and Drought Accuracy of 1-month lead predictions, Total MAE and Total Accuracy of 2-month lead predictions, and Total Accuracy of 3-month lead predictions.The superiority of the models trained based on the WRF model outputs indicates that the spatial extent of the training data is important because in-situ data are from only four weather stations.The added value caused by the topography is clear, especially in the convergence/divergence field over the islands; this crucially impacted inland and coastal precipitation and caused greater detail in precipitation to be found in the WRF model outputs [24].
The use of the ERT_WRF model produced better results compared to Adaboost_WRF in terms of Total MAE, Total Accuracy, and Drought Accuracy for all lead times, as well as in terms of Drought MAE of 1-month lead predictions.For other lead times, no statistical difference between ERT_WRF and Adaboost_WRF were found (2-to 4-month lead predictions) or ERT_WRF showed larger error than Adaboost_WRF (5-to 6-month lead predictions) in terms of Drought MAE.It shows that the

Figure 1 .
Figure 1.Topography of Fiji's main islands (color shades are in units of meters).

Figure 1 .
Figure 1.Topography of Fiji's main islands (color shades are in units of meters).

Figure 2 .
Figure 2. Location of (a) the rainfall gauges; and (b) the centroids of the Weather Research and Forecasting (WRF) model outputs.

Figure 2 .
Figure 2. Location of (a) the rainfall gauges; and (b) the centroids of the Weather Research and Forecasting (WRF) model outputs.
3.5.4.MODIS Land Surface Temperature Daytime and nighttime land surface temperature (LST) data from the Level-3 standard product of the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard the Aqua satellite, MYD11A2 LST and Emissivity 8-day L3 Global 1 km, were obtained from the Earth Observing System Data and Information System EARTHDATA of NASA from July 2002 to December 2016.MYD11A2 data are the average of daily MYD11A1 data of cloud-free days.Temporal and spatial resolutions of the data are 8-day and approximately 1 km × 1 km, respectively.The data are projected in Sinusoidal projection.

Figure 3 .
Figure 3. Flow diagram of the drought prediction model.Figure 3. Flow diagram of the drought prediction model.

Figure 3 .
Figure 3. Flow diagram of the drought prediction model.Figure 3. Flow diagram of the drought prediction model.

Figure 6 .
Figure 6.Test performance (a) Total MAE; (b) Drought MAE; (c) Total Accuracy; and (d) Drought Accuracy of SPI6 predictions from simply bias-corrected precipitation forecast (FCST_ONLY), ERT and Adaboost trained using 80% of the WRF model outputs (ERT_WRF and Adaboost_WRF), and ERT and Adaboost trained using 100% of in-situ data (ERT_INSITU and Adaboost_INSITU).Test was performed using the 20% remaining WRF model outputs.

Figure 6 .
Figure 6.Test performance (a) Total MAE; (b) Drought MAE; (c) Total Accuracy; and (d) Drought Accuracy of SPI6 predictions from simply bias-corrected precipitation forecast (FCST_ONLY), ERT and Adaboost trained using 80% of the WRF model outputs (ERT_WRF and Adaboost_WRF), and ERT and Adaboost trained using 100% of in-situ data (ERT_INSITU and Adaboost_INSITU).Test was performed using the 20% remaining WRF model outputs.

Figure 6 .
Figure 6.Test performance (a) Total MAE; (b) Drought MAE; (c) Total Accuracy; and (d) Drought Accuracy of SPI6 predictions from simply bias-corrected precipitation forecast (FCST_ONLY), ERT and Adaboost trained using 80% of the WRF model outputs (ERT_WRF and Adaboost_WRF), and ERT and Adaboost trained using 100% of in-situ data (ERT_INSITU and Adaboost_INSITU).Test was performed using the 20% remaining WRF model outputs.

Figure 9 .
Figure 9. Spatial distribution maps of 1-month lead SPI6 predictions for March 2010 and June 2010 and WRF-based SPI6.

Figure 9 .
Figure 9. Spatial distribution maps of 1-month lead SPI6 predictions for March 2010 and June 2010 and WRF-based SPI6.
Water 2018, 10, x FOR PEER REVIEW 3 of 19 Ensemble seasonal climate forecast data from APEC Climate Center (APCC MME) are used to provide up to 6 months-lead climate forecasting.Machine learning models are used to provide spatially distributed drought information for ungauged areas.In order to overcome the limitation of sparse monitoring networks, dynamically downscaled historical climate data from the Weather Research and Forecasting (WRF) model are used to train machine learning models instead of in-situ data as reference data.

Table 1 .
Fiji rainfall gauges used in the analysis.

Table 1 .
Fiji rainfall gauges used in the analysis.

Table 2 .
[28]ght categories based on Standardized Precipitation Index (SPI)[26].standardizing,andutilizing climate prediction data from 17 different climate prediction organizations from all round the world.The MME technique collates data from different high quality climate models resulting in a better forecast than each climate model's independent forecast.For this study, 6-month MME data produced by the Simple Composite Method (SCM) based on six individual models were obtained from the APEC Climate Data Service System[28].The six individual climate models were APCC model, the Centro Euro-Mediterraneo sui Cambiamenti Climatici model, the Meteorological Service of Canada (MSC) model, the National Aeronautics and Space Administration (NASA) model, the National Centers for Environmental Prediction (NCEP) model, Pusan National University (PNU) model, and the Predictive Ocean Atmosphere Model for Australia.

Table 3 .
Numbers of data samples used for training and testing.

Table 3 .
Numbers of data samples used for training and testing.