2.3. Remote Sensing
The Visible Infrared Imaging Radiometer Suite (VIIRS) is a scanning radiometer onboard the SNPP satellite. The VIIRS collects visible and infrared imagery and radiometric measurements of the land, atmosphere, and oceans. It is sensitive to 22 wavelength bands, including a “Day/Night band” (DNB) with 750-m resolution. The DNB is sensitive to visible and near-infrared wavelengths ranging from daylight down to low levels of nighttime radiance. The ability of the DNB to detect the low levels of visible light present at night makes it well suited to studying night lights. We downloaded all available VIIRS DNB 5-min swath data from the Level 1 and Atmosphere Archive and Distribution System for our study region [
15]. These data are corrected for stray solar light contamination, and pixel geolocations are corrected for changes in elevation. Clouds and shadows were masked using the VIIRS Cloud Mask Intermediate Product. DNB data were then filtered for clouds by removing all pixels with cloud mask values greater than zero.
A series of statistics were calculated for each DNB stack, including the (global) mean, median, and standard deviation. Since DNB radiance values are influenced by lunar illumination and phase (DNB values can vary by several orders of magnitude between a new and a full moon), the radiance values were linked to data on lunar illumination and phase for each image date and time.
2.5. Power Outage Classifier
Ideally, power outages could be detected by simply examining the time series of DNB values for a given location and identifying outages using traditional thresholding methods. However, the time series are often noisy, rendering such methods inappropriate. This is illustrated in
Figure 2, which shows the time series of DNB values for a location in Saharkar Nagar, a suburb of Pune. Note that there is one DNB value for each cloud-free day, measured at the satellite overpass time.
Our approach to using the DNB values to estimate power reliability treats the problem as one in statistical classification. Each pixel of an image, which corresponds to a square of length 750 m, must be classified as showing: (i) normal electricity supply conditions; (ii) a power outage; or (iii) an area with no electricity, or undetectably low, nighttime lights. Machine-learning techniques are well suited to such classification problems when data sets can be large and the objective is to obtain good out-of-sample predictions, as is true here [
17].
The machine-learning approach we employ is a random forest [
18]. This approach makes use of a set of known, correctly classified observations (i.e., satellite images of a particular location for which it is known from ground-level observation whether or not there was a power outage at the time the image was taken). This known data set is used to train a model (or classifier), which can then be applied to observations whose classification is unknown. The random forest approach employed here randomly partitions the known data set into a training data set used to train the model, and a validation data set that is used to test the model. This process is repeated
k times. The advantage of this
k-fold cross-validation technique is that all observations are used for both training and validation, and each observation is used for validation only once.
To assemble a set of known, correctly classified observations, we used ESMI data from 39 voltage monitoring locations in Maharashtra, across the districts of Nagpur, Akola, Pune, Solapur, Mumbai, and Nashik. The geographic coordinates of the 39 monitoring locations were determined from high-resolution maps of the monitoring locations available from Prayas. Three locations with outage rates above 25% were considered outliers and dropped from the sample.
VIIRS DNB data were extracted for the pixel coincident with each voltage monitoring location. The voltage reading closest in time to the satellite overpass generating an image was associated with each pixel. Following consultation with Prayas, voltage readings below 100 volts were classified as outages, whereas voltage readings above 100 volts were classified as normal supply conditions (the voltage monitoring devices did not record voltages below 100 volts. Instead, any voltage below 100 volts is assigned a value of 99 [
19]. (Electricity in India is supplied at 220 volts.)
In addition to the 39 locations above at which electricity is supplied and voltage monitored, 26 additional locations with no electricity that are in, or adjacent to, Maharashtra were included in the known data set. These locations correspond to ocean, lakes, forests and other naturally-occurring vegetation. In all, the known data set consisted of 15,393 observations across 65 training sites for the period 1 January 2015 to 10 September 2015. This data is broken into training and testing samples, with 75% of observations reserved for training.
The known data set was used to train a random forest classifier using one of two loss functions (described in the next section) with five-fold cross-validation, using R’s e1071 package (version, Manufacturer, City, US State if applicable, Country) [
20]. The model is of the following form:
Whether an image pixel reflects a power outage, normal electricity supply, or an area with no electricity (lights_out) depends on its DNB radiance value (dnb), lunar illumination (illum) and phase (phase), the (global) median DNB value (md_dnb) for that pixel, and the difference between the DNB value at the time the image was taken and the median DNB value for that pixel (dnb − md_dnb). The random forest approach implicitly determines the form of the relationship (F(∙)) between the predictors and the outcome. The relationship is not constrained to being linear, or even smooth. The approach also allows for interactions among the predictors.
As noted earlier, lunar illumination and phase have a strong influence on DNB values, particularly in rural communities, hence the inclusion of these variables as predictors. The median DNB value is derived from the time series of DNB values for a given pixel. The inclusion of this variable as a predictor facilitates identification of areas with no electricity. The difference between a DNB value for a pixel and the median DNB value for that pixel facilitates identification of outages.